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PREFACE 


This report . describes part of a comprehensive and continuing pro- 
gram of research concerned with advancing the state-of-the-art in remote 
‘sensing of the environment from aircraft and satellites. The research 
is being- carried out for NASA’s Lyndon B. Johnson Space Center (JSC), Houston, 
Texas, by the Environmental Research Institute of Michigan (ERIM) . The 
basic objective of this multidisciplinary program is to develop remote 
sensing as a practical tool to provide the planner and decision-maker 
with extensive information quickly and economically. 

Timely information obtained by remote sensing can be important to 
such people as the farmer, the city planner, the conservationist, and 
others concerned with problems such as crop yield and disease, urban 
land studies and development, water pollution, and forest management. 

The scope of our program includes : 

1. Extending the understanding of basic processes. 

2. Discovering new applications, developing advanced remote- 
sensing systems, and improving automatic data processing 
to extract information in a useful form. 

3. Assisting in data collection, processing, analysis, and 
ground-truth verification. 

The research described herein was performed under NASA Contract 
NAS9-15476 and covers the period from May 15, 1977 through November 14, 

1973. I. Dale Browne/SF3 was the NASA Contract Technical Monitor. The 
program was directed by Richard R. Legault, Vice-President of ERIM and 
Head of the Infrared & Optics Division, Quentin A. Holmes, Program Manager, 
and Robert Horvath, Head of the Analysis Department. During a major 
portion of the program Richard F. Nalepka was ERIM’s Principal Investigator. 
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The contract work was divided into several tasks. Work on two tasks 
is reported elsewhere — yield forecasting procedures incorporating Landsat 
data in Reference 36 and analysis of color image products in Reference 37 . 

During the final quarter of the contract year, the work carried out 
on the remainder of the tasks was focused on the development of a multi- 
crop acreage estimation system, Procedure M. This final contract report 
describes and evaluates that procedure and the components derived from 
the preceding work. 

The authors of this report (listed alphabetically) are: R. Cicone, 

E. Crist, R. Kauth, P. Lambeck, W. Malila, and W. Richardson. Significant 
software support was provided by D. Rice. In addition, the following 
members of the ERIM staff contributed to the reported work: R. Balon,. 

J. Gleason, S. Lindner, B. McCann, J. More, 0. Mykolenko, J. Ott, and - 
T. Wessling. Consultation provided by E. Jebe, R. Hieber, W. Holsztynski, 
H. Horwitz, F. Pont, and G. Suits is gratefully acknowledged, and apprecia- 
tion is expressed to D. Dickerson for her secretarial support. 
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1 

INTRODUCTION 

Procedure M is a technique for estimating acreages of multiple crops 
based upon remotely sensed data. Procedure M is an embodiment of tech- 
niques arid viewpoints developed at ERIM and throughout the research com- 
munity during the last' several years under the stimulus of the Large 
Area Crop Inventory Experiment '(LACIE) . - This report describes the 
development and testing of Procedure M as configured for spring wheat and 
other spring small grains. LACIE was designed to estimate the production 
of wheat. The techniques employed by LACIE were found to reliably estimate 
winter wheat and spring small grains . The further estimation of spring 
wheat production using Landsat multispectral scanner data, in the face 
of the spectral similarity of the other spring grains, has been recognized 
as one of the most difficult problems brought to the fore by the LACIE program. 

Before proceeding to the details of Procedure M and its testing, a 
broad context for the development of ■ large-scale remote sensing techniques 
is discussed and a perspective of Procedure M is given in terms of that 
context and the background of research in agricultural remote sensing. 

1.1 GENERAL CONTEXT 

The broad class of systems which may be used to affect, control, 
or monitor environment can be called environmental management systems. 

In the most general terms, an environmental management system consists 
of an information gathering system, a forecasting system, a decision 
making system, and an action taking system, as shown in Figure 1.1 

Briefly, an information gathering system obtains data regarding 
both the current state of the environment and actions that affect the 
environment. A forecasting system requests and obtains information from 
the information system and, in view of a specific set of planned actions 
and a likely set of unplanned actions, produces an objective prediction 
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FIGURE 1.1 AN ENVIRONMENTAL MANAGEMENT SYSTEM 
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of the future environmental state. The decision making system hypothe- 
sizes a set of planned actions and obtains predictions of the environ- 
mental state from the forecasting system. It decides among alternative 
sets of actions. The action taking system carries out the planned actions 
and reports actions as they occur. 

Because of long lead times for technology development, it is natural 
to first develop the information gathering component of an environmental 
management system, then the forecasting component', and last of all to 
create the possibility for coherent planned action by introducing a 
decision making system. In developing the information and forecasting 
systems it is wise to consider the characteristics needed when operating 
in conjunction with a decision making system. Notably these are accuracy, 
objectivity, and timeliness. 

By accuracy we mean that the system error distribution must be small 
enough that the outputs are useful for decision makers. Practically 
speaking this means that any particular system forecasting capability 
will be validated by independent test before being accepted as part of 
the system, and that the acceptance criteria will be set so that the 
system has a high likelihood of performing with useful accuracy. 

By objectivity we mean, basically, believability . Some of the pro- 
cedures which insure objectivity are that the forecasting process is 
visible to the decision makers in all essential elements, that the fore- 
casts arise from fixed procedures applied to, a data base, that the data 
base be subject to a rigorous quality assurance procedure, that the 
actual quantities forecasted are quantities that will subsequently be 
known with accuracy significantly better than the forecast accuracy, 
that the system publishes its estimated error distribution along with 
its forecasts, and that the system publishes posterior comparisons of 
its forecasts with the subsequently known forecasted quantities. 

By timeliness we mean that a forecasting system produces regular 
predictions of a set of forecasted quantities. In addition, considering 
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the way decision making processes usually proceed, the forecasting system 
is likely to be called .upon to produce special -reports- in- a near-real-time 
mode. This emphasizes a special need for a large, quality assured, data 
base, only a sample of which is routinely accessed for regular scheduled 
forecasts. 

1.2 BACKGROUND IN AGRICULTURAL REMOTE SENSING 

An important aspect of the world environment is the state of agri- 
culture — the amount and kind of food products available region by 
region throughout the world. For many years there has been a gradual 
development by the U.S. Department of Agriculture (USDA) of all aspects 
of an environmental management system in the United States regarding 
domestic agriculture. Regarding foreign agriculture, only the informa- 
tion gathering and forecasting functions have been attempted by the USDA. 

In the last several years, remote sensing techniques have been in 
the process of being developed to assist significantly in the process of 
information gathering, for numerous types of environmental management 
problems. The National Aeronautics and Space Administration (NASA) in 
particular has supported the development of aircraft and spacecraft 
remote sensing instruments and information extraction techniques. ERIM 
has been deeply involved in this effort, developing the first airborne 
multispectral scanners [1,2] and having a continuous 15-year history of 
improving instruments and increasing understandings of the underlying 
physical processes and the techniques of processing the data to obtain 
the desirable information from it [3-11]. 

Specific applications to agricultural problems have been initiated 
and led by NASA's Johnson Space Center (JSC) over the past decade. One 
of these was the Corn Blight Watch Experiment (CBWE) (1970), with air- 
borne scanner data and photography [12]. The purpose of the CBWE was 
to track the spread of the Southern Corn Leaf Blight northward across 
the nation. 
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With the launch of the Earth Resources Technology Satellite (now 
Landsat) in July of 1972, it became possible to consider the application 
of the spaceborne Multispectral Scanner (MSS) data to the task of Crop 
Production Forecasting over world or national regions. An early attempt 
was the Crop Identification Technology Assessment for Remote Sensing 
project (CITARS) [13]. This project involved efforts by the Earth 
Observations Division of the Johnson Space Center (JSC) , Purdue Uni- 
versity's Laboratory for Applications of Remote Sensing (LARS), and 
ERIM in an intensive effort to apply then current state-of-the-art infor- 
mation extraction techniques in an evaluation of the feasibility of inven- 
torying corn and soybeans in Indiana and Illinois. 

The possibility of using the Landsat plus collateral data to monitor 
the wheat production in the world's major wheat producing regions arose 
out of the experience gathered 1 in CITARS and elsewhere, plus the occur- 
rence and impact of major wheat crop failures around the world. The Large 
Area Crop Inventory Experiment (LACIE) was initiated by NASA and carried 
out jointly with the USDA and the National Oceanic and' Atmospheric Admini- 
stration (NOAA) , to test the feasibility of using Landsat MSS data, 
weather data, and historical data to estimate the production of wheat 
at -harvest in seven major wheat producing countries [14]. LACIE ran 
through three phases — crop harvest years 1975 through 1977. Currently 
in transition year, the feasibility of extending LACIE technology to the 
discrimination among spring small grains and to the problem of produc- 
tion inventory of corn, soybeans, and soft red wheat is being explored. 

In each of these exercises, the attempt was to use and evaluate 
existing techniques and, in each case, the existing techniques were 
found wanting in some respects. That this would be true was recognized 
in advance. One of the stated purposes of the LACIE was to "research and 
develop alternate approaches and techniques .. .where required to meet 
performance goals..." [15]. And indeed there has been substantial 
growth in the technology of information extraction during the LACIE 
program. 
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At JSC, Procedure 1, which embodies a fundamental re-thinking of 
the methods of using remotely sensed data in estimation procedures, was 
developed and implemented in LACIE by NASA/EOD and Lockheed Electronics 
Company (LEC) personnel [16,17] - Among other contributors supported 
by LACIE are LARS, UCB, and ERIM. LARS provided field measurements data 
for the development of detailed insights into the multitemporal-spectral 
description of crop canopies, and has advanced the art of sampling design 
for remote sensing surveys. The Remote Sensing Program at the University 
of California at Berkeley (UCB) has developed advanced techniques of 
photointerpretation, sampling designs, and partitioning. 

Several of ERIM's tasks have been in developing advanced techniques 
for acreage estimation, including preprocessing techniques, training 
techniques, and unbiased sampling and estimation techniques. 

These have been incorporated into Procedure M, a procedure for acreage 
estimation of multiple crops which further develops the basic approach 
of Procedure 1. 

A viewpoint that has been reinforced by the LACIE experience is the 
essential need for validation of the estimation procedures. In addition 
to its estimated quantities, as stated above, we believe that every fore- 
casting or estimation system ought to produce estimates of the error dis- 
tribution of its forecasts. We have attempted to follow this philosophy 
in the development of Procedure M, One of the most valuable legacies of 
LACIE is a large supply of accurate ground truth information and associated 
Landsat data and in— place procedures for continuing to acquire more of it. 
Without such data, tests of the types described in this report are impossi- 
ble. In our view, real progress in the development of remote sensing is 
now fully dependent on such tests. 

Section 2 describes Procedure M and its components. Then Section 3 
describes both overall performance evaluations and component evaluations, 
while a summary and conclusions are presented in Section 4. 
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2 

DESCRIPTION OF PROCEDURE M 


Procedure M is a research system for performing crop area (proportion) 
estimation based on labels assigned -to samples of multispectral scanner 
data by ground truth, by analysts, by machine/analyst combination, or by 
machine. It can operate in the LACIE Framework and is a multicrop gen- 
eralization of the previously developed Procedure B [18,19] . Between 
segment selection and final aggregation, the six major steps of the 
procedure are data preprocessing and selection, spatial feature definition, 
data stratification, sampling of entities for labeling, labeling, and 
proportion estimation, as shown in Figure 2.1. Each of these steps uses 
state-of-the-art techniques. However, the system is modular so that it 
can easily be modified, configured for different purposes, or used as 
a test bed to evaluate alternative components or groups of components . 

The key elements of the two— class Procedure B were used as the basis 
for Procedure M, because their functioning is understood and test results 
have been good, showing nearly unbiased proportion estimates using 

ground truth labels [20]. In generalizing the elements to multiple crops, 
a number of improvements were made in the overall design and in various 
components and their implementation. 

2.1 OVERALL DESCRIPTION OF PROCEDURE M CONFIGURED FOR SPRING WHEAT 

Procedure M was configured initially for the problem of inventorying 
spring wheat and other small grains, through incorporation of a two-step 
procedure for discriminating between (i.e. , labeling) spring wheat and 
other spring small grains data.* This two-step procedure utilizes analyst 
interpretation to distinguish between the 'Spring Small Grain' and 'Other' 
classes and a machine algorithm to further distinguish between 'Spring 
Wheat' and 'Other Spring Small Grain'. 


Spring wheat, spring barley, oats, rye, and triticale were considered 
to form the 'Spring' Small Grain' class. 
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FIGURE 2.1 BLOCK DIAGRAM FOR PROCEDURE M 
IN LAC IE CONTEXT 
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2.1.1 COMPARISON OF LACIE PROCEDURE 1 AND PROCEDURE M FOR 
SPRING WHEAT 

At this point in the discussion, it is appropriate to identify the 
major similarities and differences between LACIE' s Procedure 1 and Pro- 
cedure M for spring wheat inventory. Points of comparison are presented 
in Table 2.1, both operating in the LACIE framework. The major differences 
are that -the Procedure M configuration includes more preprocessing, de- 
fines and labels quasi-fields rather than individual pixels, uses a dif- 
ferent sampling strategy, incorporates a machine labeler for distinguishing 
between spring wheat and other spring small grains, and does not use maxi- 
mum likelihood classification to produce the crop proportion estimates. 

2.1.2 GENERAL DESCRIPTION OF COMPONENTS OF PROCEDURE M FOR 
SPRING WHEAT 

An overview and general description is given below for each of the 
components (modules) of Procedure M as configured and tested for spring 
wheat inventory. Details of five of the key components on which sub- 
stantial development work was performed during the contract year are 
presented later in Section 2.2, while specifics of the configuration 
tested are presented in Section 3.1.1 and appendices. The discussion 
below follows the sequence of data processing operations in Procedure M 
and starts at the point, in LACIE, where sample segments have been allocated 
and selected. 

The first data operations involve preprocessing to screen, normalize, 
and transform the Landsat data for subsequent selection and processing, 
as indicated in Table 2.2. The screening operation flags garbled data 
and data from clouds, cloud shadow, and water, and computes a haze diag- 
nostic parameter . This diagnostic parameter is then used with the spa- 
tially varying XSTAR algorithm (discussed more fully in Sections 2.2.5 
and 3.2.4) to adjust for variations in atmospheric haze across the scene 
and normalize the data to a reference atmospheric condition and reference 
sun angle. Correction for the response of different Landsat MSS sensors 
also is incorporated. These normalizations increase the stability and 
interpretability of the data and reduce scene-to-scene variability. The 
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TABLE 2.1 .COMPARISON OF LAC IE PROCEDURE 1 AND PROCEDURE. .M 

FOR SPRING WHEAT 


SIMILARITIES 

• Use of 5x6-mile LACIE Segments of Landsat Data 

• Use of Analyst for Labeling 'Spring Small Grain' 
vs. 'Other' 

• Labeling of a Sample of Data from Each Segment 


MAJOR DIFFERENCES 


Function 
• Preprocessing 


■ Entities to 
be Labeled 

- Sample Selection 
for Labeling 

• Labeling of 

' Spring Wheat ' 
vs . ' Other Spring 
Small Grain' 

• Proportion 

Estimation 


LACIE 

Procedure 1 

Sun Angle Correction 
Analyst Screening 


Maximum Likelihood Ratio 
Classification for 
Two-Class Strati- 
fication, followed 
by Bias Correction 


Spring Wheat 
Procedure M 

Sun Angle Correction 
Machine Screening 
Satellite Calibration 
Haze Correction 

Tasseled-Cap Data 
Transformation 

Interiors of Quasi- 
Fields 

Random Selection from 
40 Spectral Strata 

Machine Algorithm 


Aggregation of Stratified 
Sample Estimate 


Pixels 

Fixed Selection 
from 209-Dot Grid 

Analyst 
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TABLE 2.2 LANDSAT DATA PREPROCESSING AND SELECTION 


Screen to - Exclude / Bad Data 

or \ Clouds 

Flag I Cloud Shadows 

\ Water 

- Compute Haze Diagnostic 
Correct for Landsat MSS Sensor Calibration 

Correct for Sun Angle ) Spatially Varying 

, . 7T ( XSTAR Algorithm 

Correct for Atmospheric Haze ) 

Transform Data: Tasseled-Cap Linear Combinations 

Select Segments (Analyst) — Criteria are: 

- Acquisitions Exist for Adequate Separability of Spring 
Small Grains from Other Crops 

- Acquisitions That Will Provide a Good Definition of the 
Field Pattern Present 


Select Acquisitions (Analyst) — Criteria Include: 

- Acquisition (s) at the Dough or Ripening Stage of Wheat 
Development 

- Acquisition (s) Near and After Peak Green Development Stage 

- Other Acquisitions that Provide Good Definition of Field 
Pattern and Spectral Separability of Spring Small Grains 
from Other Crops 


Select Spectral Features: 

- Brightness and Greenness , for Each Selected Acquisition Date 


11 



2p 

final preprocessing step is a transformation of the corrected Landsat 
channel values to Tasseled-Cap space, using linear combinations [21,2'2]. 

The first two combinations or principal directions. Brightness and Green- 
ness, contain a majority of the variability and information in the Land- 
sat data and have physical meaning. 

The other aspect of preprocessing is the selection of data for 
processing. As indicated in Table 2.2, this includes selection by the 
analyst of segments and acquisitions according to the stated criteria. 

Only the Brightness and Greenness spectral features for selected acqui- 
sitions are subsequently used in the procedure. 

The second component detects spatial features in each scene 
(Table 2.3). An approximation of the field pattern is defined or ex- 
tracted from multitemporal data by using a clustering algorithm (BLOB) 
that employs both spectral and spatial variables [23]. A set of BLOB 
parameters which is optimized for various times of the growing season 
has been established and produces good results, as discussed further in 
Sections 2.2.4 and 3.2.3. The algorithm actually defines quasi-fields 
which frequently, but not necessarily, follow farm field boundaries. For 
instance, if two adjacent farm fields have the same or similar crops they 
may be assigned to the same quasi-field. Conversely, if some spectral 
anomaly, such as a bare area, is present within a farm field, two dif- 
ferent quasi-fields, one for the bare area and one for the remainder, may 
be assigned to it. 

A key next step in field definition is that of stripping away the 
edge pixels from each defined quasi-field. These pixels are the ones 
most likely to contain mixtures of two or more different crop types and 
are most susceptible to errors induced by spatial misregistration of data 
channels acquired on different dates. By eliminating these edge pixels 
from calculations of spectral data means and requiring the analyst to 
label only quasi-field interiors, we believe that major sources of analyst 
labeling errors are likewise removed. It remains only to demarcate for 
the analyst those quasi-field interiors that are to be labeled (See the 
example in Figure 2.2) and to count the number of pixels in each quasi-field. 
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TABLE 2.3 SPATIAL EEATURE DEFINITION 


• Apply Spectral/Spatial Clustering 

- BLOB Algorithm 

- Up to Four Dates: Biostages 1-4 

- Brightness and ■ Greenness Each Date 

• Operate on Each Defined Quasi-Field (Blob). 

- Count Number of Pixels 

- Strip Away Edge Pixels 

- Compute Spectral Means 

- Display Boundary to Analyst (if selected) 

• Results : 

- Quasi-Field Interiors Defined for Selection and Labeling 

- Effects of Mixture Pixels and Spatial Misregistration 
Minimized for Labeling 



FIGURE 2.2 EXAMPLE MAP OF QUASI-FIELD (BLOB) INTERIORS 
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Because there may typically be 300 to 500 interiors defined by 
BLOB for a LAGIE segment and because it is not practical to require the 
analyst to label them all, some sampling is required. In Procedure M for 
spring wheat, the sample selection process has two stages — a spectral 
stratification followed by sample allocation and selection. 

The- spectral stratification process as employed for spring wheat 
is summarized in Table 2.4 and discussed more fully in Sections 2.2.3 
and 3.2.2. In essence, it is a second clustering operation, this time 

using only the spectral means of field interiors as the items to be 

clustered. As noted in Table 2.4, two passes of the ERIM clustering 

algorithm are utilized, which is a refinement of the stratification 

process used previously in Procedure B. For the spring wheat inventory 
problem, it was decided to define 40 spectral strata for each segment, 
a number based on prior experience and results with Procedure 1 and 
Procedure B. In a multisegment configuration. Procedure M stratifica- 
tion could use collateral features as well as spectral features for 
stratification. 

We also decided to allocate and sample 100 quasi-fields for label- 
ing, among the 40 spectral strata (Table 2.5). The number of samples 
assigned to each stratum is made proportional to the size (total number 
of pixels) of the stratum. An unbiased method for choosing the samples 
allocated to each stratum was developed and is described' in Section- 2. 2. 2. 

The most critical stage in crop inventory procedures is labeling 
the crop type of the designated samples (fields or pixels) . In the con- 
figuration of' Procedure M tested and described in this report, a two-step 
labeling procedure is followed, as summarized in Table 2.6. In the first 
step, the analyst labels each designated quasi-field as either 'Spring 
Small Grain' or 'Other'. In the second, a machine algorithm operates on 
those, quasi-fields designated 'Spring Small Grain' by the analyst and 
assigns a proportional label among two classes: 'Spring Wheat' or 'Other 

Spring Small Grain'. If the acquisition needed to make this determination 
is not available, the label 'Unidentifiable Spring Small Grain' is assigned. 
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TABLE 2.4 


SPECTRAL STRATIFICATION OF SEGMENT 


• Cluster Quasi-Field Spectral Means to Produce a Specified 
Number of Strata (B Clusters) 

- Employ ERIM Clustering Algorithm as Follows: 

• • On a First Pass, Adapt Cluster Means 
as Clusters Grow 

• • On a Second Pass, Assign Cluster Membership 
on Basis of Final Mean's from First Pass 

- Produce 40 Strata — Number Chosen Based on: 

• * The Design and Experience of Procedure 1 
• • Prior Procedure B Test Results 


TABLE 2.5 ALLOCATION AND SAMPLING OF QUASI-FIELDS FOR LABELING 


• Allocate 100 Quasi-Fields (Blobs) for Labeling 
Among the Strata with Number Proportional to 
Stratum Size (Total Number of Pixels) 

• Choose the Quasi-Fields Allocated to Each Stratum 
in an Unbiased Manner 


15 


2?ri 


TABLE 2.6 LABELING PROCEDURE 


Step 1: • ’Spring Small Grain* vs. ’Other’ 

- This Function -Operationally is Performed by an Analyst 

- Ground Truth Was Used in Designating a Quasi-Field 
Label for T & E Purposes (A Quasi-Field was Labeled 
Grain if It was More Than 50% Grain) 


Step 2: ’Spring Wheat' vs. ’Other Spring Small Grain’ 

- Machine Algorithm is Automatically Applied to Each 
Quasi-Field Called ’Spring Small Grain’ by Analyst 

- - If the Proper Acquisition is Not Available, the 
Quasi-Field is Labeled ’Unidentifiable Spring 
Small Grain ’ 

- Otherwise, Tito Classes are Designated Proportionally: 

• ♦ Spring Wheat 
• . Other Spring Small Grain 


Result: Each selected quasi-field is either labeled proportionally 

among the spring small grain classes br else is labeled 100% 
unidentifiable spring small grain or 100% non spring small grain. 


16 



Jm ; 

For the initial testing and evaluation reported herein, ground 
observations were used as a substitute for analyst labels in Step 1. 

The machine algorithm of Step 2 has the key elements identified in 
Table 2.7. It is designed to capitalize on the fact that barley ripens 
more rapidly and/or somewhat differently than spring wheat and to detect 
the spectral manifestations of this process. Details of this algorithm 
are presented in Sections 2.2.1 and 3.2.1. 

The final step of Procedure M is to take the crop labels assigned 
to the selected quasi-fields and use them to compute crop proportion 
estimates for the segment. As indicated in Table 2.8, a proportional 
label is. computed for each spectral stratum, using the labeled quasi- 
fields within it. The stratum proportions are then aggregated to produce 
various segment proportion estimates. One is the two-class estimate 
represented by the proportion of spring small grains. Another is the 
three-class estimate which divides the spring small grains class into 
'Spring Wheat' and 'Other Spring Small Grain*. The two-class estimate 
will reflect the accuracy of analyst labeling. The three-class estimate 
may be unreliable or have high variance if too few of the analyst-labeled 
spring small grain fields have suitable acquisitions, for the machine 
labeler to further discriminate among them. A reliability flag will 
accompany the estimates. 

This concludes the general description of Procedure M for spring 
wheat. Details of five key components are presented in Section 2.2. 

2.2 DESCRIPTION OF SELECTED COMPONENTS OF PROCEDURE M 

Development aspects and characteristics of several key components 
of Procedure M, as configured for spring wheat, are described below. 

The order of discussion is the reverse of that in the preceding section — 
we begin with the machine labeler and work backwards through the data flow 
to the preprocessing phase that performs atmospheric haze correction. In 
between, are discussions of the unbiased sampling strategy, spectral strati- 
fication, and spatial feature definition. 
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TABLE 2.7 ELEMENTS OF MACHINE ALGORITHM FOR DISCRIMINATING 
-AMONG SPRING SMALL GRAINS 


• A Reference Profile of Green Development vs. Day of Year 
(Day of Peak Greenness is a convenient reference point) 


• A Calculation of Crop Calendar Shift for Each Field or Pixel, 
Relative to the Reference Profile 

■ A Characteristic Distance, in the Brightness-Greenness Plane, 
for Each Date (Values Increase as the Grain Ripens) 

• A Decision Threshold on the Computed' Characteristic Distance, 
as- a Function of Days Since Peak Greenness 
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TABLE 2.8 PROPORTION ESTIMATION 


Utilize Quasi-Field (Blob) Labels to Compute a Proportional Label 
for Each Spectral Stratum (B Cluster) (Based on Total Pixels) 


Generate Intermediate Segment-Level Proportion Estimates, 
Based on the Proportional Label and Total Number of Pixels 
in Each Stratum 

- Spring Wheat 

- Other Spring Small Grain 

- Unidentifiable Spring Small Grain 

- Other 


Adjust Intermediate Proportion Estimates by Partitioning 
Unidentifiable Spring Grains into 'Spring Wheat' and ’Other 
Spring Small Grains' In Accordance with Their Raw Proportions 
to Produce Final Proportion Estimates for the Segment: 

- Total Spring Small Grain Proportion 

— Spring Wheat Proportion; Other Spring Small 
Grain Proportion 


Flag Potentially High-Variance Spring Wheat Estimates 
(Based on Number of Quasi-Fields in Unidentifiable Spring 
Small Grain Class) 

- For Our Tests, Only a Total Spring Small Grains 
Estimate was Produced if More Than 50% of the 
Selected 'Spring Small Grain' Quasi-Fields Were 
Labeled 'Unidentifiable' 




2.2.1 DESCRIPTION OF MACHINE LABELER 

While- the- -discrimination of spring small grains- from- other -cover- 
types in the spring wheat configuration of Procedure M is carried out 
by analyst interpreters, the finer discrimination of spring wheat from 
other spring small grains is entirely a machine function. This second 
phase, which begins when the analyst has identified those fields in the 
training sample that are spring small grains, is itself a two-step 
process: estimation of crop calendar shift, followed by label assignment. 


2. 2. 1.1 Estimation of Crop Calendar Shift 


Basic Concept . Observed through time, a spring small grains pixel 
or field should exhibit a pattern, in Tasseled-Cap Greenness, such as 
that represented in Figure 2.3(a). In the absence of system noise or 
other outside influences, one could reasonably expect that other spring 
small grains pixels, observed at identical points in time, would have a 
similar appearance if their growth stages were the same. However, a 
more common occurrence is illustrated in Figure 2.3(b), where observa- 
tions at the same point in time show a high degree of signal variation. 


The underlying assumption of crop calendar shift estimation is that 
a large part of this variation is the result of differences in stage of 
development at the time of observation. By fitting a model form to 
data like those in Figure 2.3(a) (See Figure 2.3(c)), and then shifting 
the model form along the day-of-year axis, we find that the sets of 
observations, while showing much variability on the day of acquisition, 
are in fact different points along a common curve form (Figure 2.3(d)), 
differing only in their stage of development at the time of observation. 
Conversely, by shifting each set of observations to a common reference 
time, within-day signal variablility can be substantially reduced (Figures 
2.3(e) and (f)). In addition, since previous studies at ERIM [ 24,253 


We gratefully acknowledge the work of Dr. Gautam Bahdwar of NASA/ JSC 
in first using spectral profiles to more closely estimate the stage of 
plant development. The shift procedure presented here is an extension 
of his work. 
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FIGURE 2.3 BASIC CONCEPT OF CROP CALENDAR SHIFT 
BASED ON GREEN DEVELOPMENT PROFILE 
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have suggested that effective spring wheat and barley separation can 
only b.e accomplished in a relatively narrow day span around the dough 
stage of development, application of the crop calendar shift allows for 
proper selection of the acquisition to use in the label assignment step. 
Figure 2.4 is an example of real data with and without the crop calendar 
shift applied. 

General Approach . The model form used in Procedure M for spring 
wheat is illustrated in Figure 2.5. A cross-correlation calculation, 
which is independent of differences in overall signal magnitude, is used 
as the goodness-of-fit criterion. In order to obtain a stable estimate 
of the shift to be applied, at least three acquisitions must fall in the 
time interval between plant emergence and harvesting, while additional - 
acquisitions in this range result in a more accurate shift estimation. 

It should be noted that a single profile is used for all spring 
small grains and all sample segments. A study of differences in shift 
between more specific (segment or crop) profiles and more general (multi- 
segment or all small grains) profiles indicated that, for 85-95% of the 
pixels tested, shift differences were within a range attributable to 
noise (± 2 days or less) . In light of the added complexity introduced 
by either trying to adjust the reference profile for each new sample 
segment or testing with a different profile for each small grains crop, 
it is a significant advantage to be able to use one common profile. 

Implementation . The actual crop calendar shift estimation in 
Procedure M for spring wheat is carried out on two levels: field-by-field 

and then pixel-by-pixel. The field-level shift provides an approximation 
of the final shift for each pixel and also serves to identify those spring 
small grain quasi-fields lacking the required number of acquisitions 
within the reference time interval. Such fields are labeled 'unidenti- 
fiable spring small grains' and are removed from further consideration 
by the labeler. 
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T> PT 

Model.' Form: F (T) = AT e 

F(T) = I (Greenness) - 25 ] 

.T = [(Day of Year) ~ 125] 

A = 0.65163 

B = 1.2957 
C = -0.52415 x 10 -3 


FIGURE 2.5 GREEN DEVELOPMENT PROFILE MODEL 
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At the pixel level, each interior pixel of the given quasi-field 

i 

is examined individually, and a fine tuning of the field shift is made. 

This second level of shift estimation provides for more accurate estima- 
tion of crop development for each pixel and also accommodates differences 
in stage of development within a fipld shape (as when several small fields 
are grouped together into one quasi-field by the spectral/spatial cluster- 
ing algorithm) . 

2. 2. 1.2 Label Assignment 

Background . Studies at ERIM over the past two years clearly demon- 
strated the marked spectral similarity of the various spring small grains, 
even when view multitemporally . However, our tests on both Phase 2 and 
Phase 3 LACIE data have also indicated that, in the acquisition most likely 
to correspond to the dough stage of plant development, barley tends to be 
somewhat more advanced, and spectrally brighter, than spring wheat [24,25]. 
After heading, barley fields seem to ripen at a faster rate and/or follow 
a short-cut in the trajectory illustrated in Figure 2.6 in Brightness- 
Greenness space, such that by the dough stage they are farther from the 
green arm (a line approximately parallel to the spectral path followed by 
developing green vegetation) than are spring wheat fields at the same 
point in time. This separability is lost as the fields complete the 
ripening process and begin to be harvested, primarily because the signa- 
tures at these times are considerably more variable. 

By providing an estimate of the actual stage of development of each 
pixel at each acquisition date, the crop calendar shift makes possible 
a clearer understanding of the spectral relationship between spring wheat 
and barley, and thus a more precise .definition of the labeling criterion. 

It should be noted that the labeling criterion we have devised is 
based on spring wheat and barley separation. Too few rye or triticale 
fields were present in the training data to allow any decision logic to 
be established for these crops. Oats, too, occurred with significantly 




lower frequency than spring wheat and barley. In addition, limited 
past work had shown few spectral differences between spring wheat and 
oats [26]. 

Development . Four Phase 3 LACIE blind sites (1498,1515,1640,1663) were 
used to develop the labeling criterion. These sites were chosen based 
on the availability of acquisitions in the reference time interval and 
around the dough stage of development, as well> as the presence of suffi- 
cient numbers of spring wheat and barley pixels to adequately characterize 
the behavior of the two crops. Atmospheric corrections were applied using 
the spatially varying XSTAR algorithm and a crop calendar shift was esti- 
mated for each interior pixel. 

Examination of Brightness-Greenness scatter plots for each shifted 
day of the year after peak Greenness (Day 160 on the reference scale) 
suggested that adequate separability could be obtained on Reference 
Days 186 through 203. In this day range, optimum linear discriminant 
analysis was carried out, using greenness and brightness as the discrimi- 
nant variables. The decision lines selected by this process showed a 
marked similarity in slope from day to day, differing only in the value 
of their y- intercept, and had similar slopes and intercepts from segment 
to segment for each day. Accordingly, a reference line was defined 
having the same slope as the set of decision lines, and the distance 
from that line was used as the discriminant (see Figure 2.7). Optimum 
discriminant values were again calculated, using the newly defined dis- 
tance, and again the chosen values were very similar from segment to 
segment for each day in the chosen range. Further, the mean decision 
values for the four segments, when plotted against a time axis, defined 
a straight line. Thus the final decision measure, common to all four 
segments, was a simple linear function of day of year (after shifting) 
and distance from the reference line (see Figure 2.8). 

Using this measure on the four segments from which it was developed, 
we achieved an average labeling accuracy for spring wheat and barley of 
75% to 85%. Table 2.9 gives a segment-by-segment breakdown of the results. 










TABLE 2.9 LABELING RESULTS FOR DEVELOPMENT SEGMENTS 


1498 1515 - 


Est. Spring 

True 

Spring 

Wheat 

True 

Barley 

Est. Spring 

True 

Spring 

Wheat 

True 

Barley 

Wheat 

745 

103 

Wheat 

2125 

379 

Est. Barley 

229 

292 

Est. Barley 

445 

• 1975 

Correct 

76.5% 

73.9% 

Correct 

82.7% 

83.9% 



True 

1640 


True 

1663 

Est. Spring 

Spring 

Wheat 

True 

Barley 

Est. Spring 

Spring 

Wheat 

True 

Barley 

Wheat 

4397 

827 

Wheat 

3624 

282 

Est. Barley 

891 

1241 

Est. Barley 

442 

858 

Correct 

83.2% 

60.0% 

Correct 

89.1% 

75.2% 
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2.2.2 UNBIASED SAMPLING STRATEGY 

This section is concerned with how to choose a random sample of 
quasi-fields from a stratum in such a way that the percent wheat in the • 
sample is an. unbiased estimate of the percent wheat in the stratum. 
Although "wheat" plays a role in the discussion, the result applies to 
other crops and attributes whether estimated singly or in groups. Also, 
the sampled entities are called "fields", for convenience. 

At first glance, it would appear that there isn't a problem. Why 
do we need to do anything more complicated than draw a simple random 
sample, that is, we decide on the number k of fields to sample and then,, 
giving all samples of k fields equal probability, choose one at random? 

The answer is that this simple scheme results in a biased estimate. 
To produce an unbiased estimate, we sample the first field with proba- 
bility proportional to size and the remaining k-1 fields with equal 
probability, a technique first described by H. Midzuno in 1951 [27], [28] 
and [29], 

Because this technique defies intuition and could become a source 
of doubt and controversy, a more extended discussion of it than is pro- 
vided by the references will be given here. 

Suppose we are sampling k fields from a total of B in the stratum 
and that each field, i, has n^ pixels and has a proportion p^ of wheat. 
Then n^p^ is the number of wheat pixels in Field i. 

A 

The proportion p of wheat in the sample is 


I n.p. 
u xi 

- I n. 

L i 


Bias that may be introduced by omitting some strata from the aggre- 
gation process is discussed in Section 3. 1.2.1. 
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where the sums are taken over all fields in the sample. The proportion p 
of wheat in the stratum is the s am e .expression .except -that the sums are 
taken over all fields in the stratum. The problem is to find a proba- 
bilistic method of selecting the' sample so that the wheat proportion in 
the sample is an unbiased estimate of the wheat proportion in the stratum. 

That- the simple random sample is not such a method -is shown by the 
following example. Suppose we have three fields, a, b and c, in the 
stratum, we are sampling just one field, and the n. ’s and p^'s are as 
follows: 


Field 

n. 

X 

fl 

n.p. 

x r x 

a 

50 

0.6 

30 

b 

-20 

0.2 

4 

c 

10 

0.1 

__1 

TOTAL 

80 


35 


p is 35/80 = 0.4375. p is 0.6, 0.2 or 0.1 depending on whether 
a, b or c is chosen as the sample. According to the simple random 
sampling scheme, each of these samples has equal probability, 1/3, of 
being chosen. The expected value of p is obtained by multiplying the 
probability of each sample by p for that sample and summing. So 

£p = jx 0.6 -+ x 0.2 + J x 0.1 = 0.3 

which is 0.4375. Thus p is a biased estimator of p. 

But if we apply the Midzuno technique to this special case where 
just one field is chosen, we choose that field with probability pro- 
portional to size. Then Field a has a probability 50/80 of being 
chosen, Field b, 20/80 and Field c, 10/80. 
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35 

80 



A. 

The expected value of p is 


50 

80 


x 0.6 


+ 



0.2 


+ 



0.1 = 


which = p. Thus the Midzuno technique is unbiased in this example. 

The part of' the Midzuno technique that strains our intuition is 
choosing fields subsequent to the first with equal probability. Let 
us see how this technique handles the choice of two fields in our 
example. There are three possible samples, (a and b) , (a and c) and 
(b and c) . The probability of each sample, the wheat proportion in 
each sample, and the product of the two for computing £p are given in 
the table below: 


Sample Sample Probability 


Sample Probability 
Wheat Proportion p x Wheat Proportion 


a and b 


50 1 . 20 1 

80 X 2 + 80 X 2 


(50 x 6/6) + (20 x 0.2) 
50 + 20 


30 + 4 

80 x 2 


a and c. 


50 1 10 1 

80 X 2 + 80 X 2 


(50 x 0.6) + (10 x 0.1) 
50 + 10 


30 + 1 
80 x 2 


b and c 


20 1 10 1 
80 X 2 80 X 2 


(20 x 0.2) + (10 x 0.1) 
20 + 10 


4 + 1 
80 x 2 


The sample probability of (a and b) is computed by realizing that 
this sample can come about in two ways: either a can be chosen first 

with probability 50/80 and b is then chosen with probability 1/2 or b 
is chosen first with probability 20/80 and then a is chosen with proba- 
bility 1/2. 

To get the expected value of p, we multiply the sample probability 
by p for each sample and sum. The number of pixels in the sample appears 
in the denominator of the wheat proportion and in the numerator of the 
sample probability. This factor cancels when we multiply the two and we 
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are left with the uncluttered expressions in the right hand column. When 
this column is summed, we get' 

c - = 2 (30 + 4 + 1) ' _ 35 

2 x 80 80 

which = p. And so again, the unbiasedness of the Midzuno technique is 
exhibited. 

An algebraic proof of the unbiasedness of the Midzuno technique, 
built oh the insights of the previous example, is presented in Appendix A. 

We note that in the example just given, we chose a sample with proba- 
bility proportional to the size of the sample (i.e., the number of pixels 
in the sample): sample (a and b) had probability (50 + 20) x constant; 

(a and c) ,. (50 + 10) x constant; (b and c) , (20 + 10) x constant.. This 
conclusion holds in general (See Appendix A) . Thus we can think of the 
Midzuno technique as a. random mechanism for selecting & sample of quasi- 
fields with' probability proportional to size . A more direct mechanism 
would be to enumerate all possible samples, give each a probability pro- 
portional to size, compute cumulative probabilities for the sequence of 
samples, choose a random number between 0 and 1 and observe at which 
sample it falls within the cumulative probabilities. We don't use this 
mechanism because as k and B increase it becomes rapidly impractical. 

If k=5 and B=150, for example, we would have to compute 590 million 
probabilities. We are fortunate to have a practical mechanism that 
achieves the same end. 

2.2.3 SPECTRAL STRATIFICATION 

In order to increase the efficiency of the sampling of quasi— fields 
to be labeled, the population of quasi-fields is split up into strata. 

It is well known [30 ] that if the stratification has some relation to 
the attribute being estimated and if the strata are sampled in proportion 
to their size, a more precise estimate is obtained from the sample. 
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In Procedure M the stratification is done by an algorithm BCLUST 
which groups together quasi-fields that are spectrally similar for all 
the biophases observed. The effect is to concentrate the crops of 
interest into a few of the spectral strata. 

The first step is to put the quasi-fields of a segment in random 
order. Omitted from the list are the so-called "small fields", namely, 
those that have no interior pixels. (An "interior pixel" is one that 
faces pixels from the same quasi-field on all four sides.) The small 
quasi-fields, usually stringy boundary areas between real fields, are 

omitted because they are difficult to label and subject to registration 
errors. The parameters of the algorithm setting up the quasi-fields are 
chosen so that few of the pixels in the segment are in small quasi-fields. 

The large quasi-fields are clustered using the multitemporal spectral 
mean vectors. The means are computed over the interior pixels only, 
because these pixels are less likely to be subject to registration errors 
or to mixed spectral responses and they therefore more purely .represent 
the crop or material present in the quasi-field. Thus the intent of 
stratifying the data according to crop or material is further realized. 

The distance measure used in the clustering is 


d. 

1 


nchan 


3=1 


2 _ 2 
w/(x. - x..) 

3 3 Ji 


where 

x, , . . . , x , is the data vector (a quasi-field mean) 

1 nchan 

x , . is the mean vector of cluster i 

li nchan , i 

nchan is the total number of multitemporal spectral channels 
d. is the distance from the data vector to cluster i 

i 

w. is a weight on channel j 
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Each new data point x is assigned to the cluster i for which d_^ is 
smallest, except that if the minimum d_^ is greater than a parameter x, a 
new cluster is created with its mean initially at x. As each point enters 
a cluster, it is included in the calculation of the cluster mean by the 
updating formula 




new x. = — old x. + ; — - x 

1 n. + 1 i n. + 1 

i i 

where is the former number of points in cluster i. 

The number, of clusters created depends on the weights w^, w nc ^ an 

and x. The larger x is in relation to the weights, the smaller the number 
of clusters. Appendix B discusses how these parameter values were set. 

The present implementation of BCLUST has a provision for repeating 
automatically with appropriate changes in x until a desired number of 
clusters is achieved. Other options include switches for turning off the 
creating and updating capabilities, a provision for seeding clusters with 
arbitrary values or with the means from a previous run, the use of a data 
transformation matrix rather than merely a set of weights, and a provision 
to start with a small value of x and increase it asymptotically to a desired 
final value. This last option has the effect of seeding the clusters with 
the first data vectors and tends to produce clusters that are more uniform 
in size. 

BCLUST also has the capability of incorporating collateral informa- 
tion into the distance formula when strata are being formed from a pool 
of quasi-fields from several segments. The collateral information such 
as a moisture index or crop calendar figure, is' a single value for the 
whole -segment. This capability of BCLUST is not used in Procedure M for 
spring wheat which operated only on single segments. 
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In running- BCLUST for Procedure M, we have used the repeated run 
option to converge to a desired number of clusters, then used the means 
of the converging run as seeds to make another pass through, the data 
with no creating or updating. We have also used the Tasseled Cap channels 
Brightness and Greenness, and have set the weights in inverse proportion 
to the effective ranges of the variables. The specific parameters used 
are presented in Appendix B. 

2.2.4 DEFINITION OF SPATIAL FEATURES 

An aerial photograph of an agricultural area shows that the scene 
is divided into areas called fields, usually rectangular or some other 
simple shape, within which the color is nominally uniform. A field in 
Landsat data is similarly defined: a group of neighboring pixels in 

some simple shape whose spectral characteristics are very likely to be 
uniform. 

Because of the one-acre resolution of Landsat data, it does not 
seem possible to reconstruct, without further information, the fields 
that would be evident in higher resolution data. But we can define 
groups of pixels that we call quasi-fields that have properties similar 
to fields, namely that their pixels are spatially close and spectrally 
similar . 

Several purposes are served by such a definition: 

1. Using the quasi-field as the unit of analysis rather than 
the pixel increases processing efficiency through a data 
compression factor of about 30. . 

2. Averaging the pixel values over a quasi-field smooths out 
noise in the data. 

3. Stripping away the edge pixels of the quasi-field allows 
working with the relatively pure interior pixels. Purity 
refers not only to a uniform spectral response but also to 
the invariance of the associated ground truth, as demonstrated 
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in tests on Kansas and North Dakota segments [20,23], This 
purity contributes to the success of labeling techniques, 
whether carried out by humans- or by computer programs. It 
also contributes to the • grouping of quasi-fields into meaning- 
ful strata that reduce sampling error. 


The algorithm used by Procedure M to create quasi-fields is called 
BLOB; other possibilities might have been AMOEBA [31] or ECHO [32], 

BLOB is a clustering algorithm similar to BCLUST (Section 2.2.3) but 
based on the spatial channels, line number and point number, in addition 
to the spectral channels. 

The distance function for deciding which quasi-field a pixel belongs 
to is: 


(P - P ± ) 2 


where 

x^, ...» x nc jj an i s spectral data vector for a pixel 

L is the pixel line number 
P is the pixel point number 

^li’ ^nchan i t * ie s P ectra -^ mean vector of quasi-field i 

L is the mean line number of quasi-field i 
P is the mean point number of quasi-field i 
nchan is the number of spectral channels 
d_^ is the distance from the pixel to quasi-field i 

V., V and V are weights attached to the spectral and 
3 L r 

spatial variables in the distance function 
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The pixel joins the quasi-field with the smallest provided that 
this value is less than a parameter t. Otherwise the pixel starts a 
new quasi-field. The mean vector for a quasi-field is computed from 
all the pixels in the quasi-field by an updating formula as in BCLOST. 

The numbers V^, . .., ^ nc h an > an< ^ T a ? e parameters of the 

algorithm that affect its performance. The larger t is relative to 

the others, the larger the quasi-fields are and the fewer there are of 

them. Increasing r also reduces the number of quasi— fields with no 

interior pixels, fields that are left out of the stratified sampling. 

Large values of V T and relative to V,. . . . , V , have the effect 
L P 1 nchan 

of emphasizing the spectral, rather than the spatial, variables , and 
may produce quasi-fields that are not very cohesive geographically. 
Relatively small values of V and V emphasize the spatial variables 

1j r 

and may produce compact quasi-fields that are not as homogeneous spec- 
trally as one would like and may subdivide large fields. Information 
regarding the parameters of BLOB are reported in Appendix B. 

It is possible that clouds may obscure field patterns. When the 
BLOB algorithm is applied to multiple time periods, it is now possible 
to avoid data that are cloud covered. This was accomplished by a modi- 
fication that first excludes any channel data that have been flagged by 
the screening process (i.e., using only cloud-free data) in computing d , 
and adapting the t distance factor to reflect the use of fewer channels 
of information. 

The BLOB algorithm allows the use of an alternate- distance function 
that favors the formation of rectangular fields. When used, the line and 
point coordinates L and P are rotated by a linear transformation to obtain 
l and p that measure in the North-South and East-West directions, respec- 
tively. Then the last two terms of the distance function are replaced by 



Using this distance function, all points equi-distant from the spatial mean 
of a quasi-field form a North-South/East-West rectangle rather than an ellipse. 
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2.2.5 ATMOSPHERIC HAZE CORRECTION 

.An. atmospheric haze eor-rec-tion, preceded 1 by data- screening- and 
followed by a data transformation (to the Tasseled-Cap linear channel 
combinations) , forms the preprocessing component of Procedure M. During 
the research leading to Procedure M-, it became increasingly apparent 
that an atmospheric correction algorithm was needed which would compen- 
sate not only for large scale (segment- to-segment) atmospheric variations, 
but for smaller scale (within segment) variations as well. For this pur- 
pose the application of the XSTAR haze correction algorithm [33,34] (which 
was developed- during a previous effort) was changed from a global appli- 
cation (a fixed correction throughout a segment) to a spatially varying 
application (a variable correction within each segment) . The resulting 
algorithm is called the spatially varying XSTAR haze correction [35]. 

In principle, the spatially varying XSTAR algorithm calculates its 
haze diagnostic within a moving window (which has a 15 pixel diameter 
between half amplitude points), using only those pixels which have passed 
the screening procedure, and then applies its correction to the pixel at 
the center of the window. However, in the detailed implementation of the 
procedure the application of the moving window is quantized as described 
below. This quantization reduces the execution time for the procedure on 
the computer, with the result that the spatially varying XSTAR procedure 
costs slightly less than twice as much to run as the former global XSTAR 
procedure cost. 

The spatially varying XSTAR procedure follows six steps, as outlined 
in Table 2.10. In the first step, the SCREEN procedure [33,34] is applied 
to the data to flag pixels (e.g., bad data, dense clouds, cloud shadows, 
or water) which are not usable in the haze diagnostic procedure. However, 
for the spatially varying XSTAR correction, two of the SCREEN thresholds 
•are relaxed somewhat, as described in Reference 35. This allows more 
extreme haze concentrations to be diagnosed and corrected than had been 
the case previously with the global XSTAR correction (which needed to 
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TABLE 2.10 STEPS IN SPATIALLY VARYING XSTAR HAZE 
CORRECTION PROCEDURE 


• Screen Data (Using Less Stringent Cloud and Dense Haze Thresholds) 

• Calculate Mean Signal Values for 5 Line by 5 Pixel Blocks (Using Only 
"Good” Pixels) 


* Calculate Spatially Smoothed Mean Values for Blocks, Using a Moving 
Window Filter 


* Use Spatial Interpolation/Extrapolation to Estimate Mean Values for 
Blocks Which Have an Insufficient Number of Good Pixels 


• Calculate XSTAR Correction Appropriate at Each Block Center 

• Interpolate XSTAR Correction to Apply to Each Pixel 
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exclude pixels within haze concentrations which were not typical of the 
majoxity of the segment to. be corrected.).. The relaxed -thresholds, also 
help the correction algorithm to track haze variations more accurately. 

The second step of the spatially varying haze correction procedure 
divides the scene into 5 line by 5 pixel blocks, and calculates a mean 
value for each block, using only signal values from the "good" pixels 
within the blocks. ("Good" pixels are those pixels which pass the 
SCREEN procedure.) Mean values for blocks with no good pixels or with 
fewer good pixels than half the average number of good pixels per block 
(truncated to integer form) are not used. For these "unknown" blocks, 
mean values are estimated by interpolating or extrapolating from neighbor 
ing block mean values, as described in Steps 3 and 4 below. 

In the third step of the procedure, the mean values of the 5 line 
by 5 pixel blocks are smoothed, using a non-recursive moving window 
filter. The filter approximates a Gaussian shape, with a 3 block dia- 
meter between half amplitude points. In this stage, smoothed mean values 
are calculated for all blocks with "known" mean values, and for all 
blocks with at least one near neighbor (either along-track or across- 
track) which has a "known" mean value. The smoothed mean value is the 
weighted average of the available "known" mean values within the window 
of the filter. 

Step 4 of the spatially varying XSTAR procedure is used to assign 
smoothed mean values to blocks which still have "unknown" mean values 
after Step 3. In this step only those blocks which have "unknown" mean 
values, but which have at least one near neighbor (either along-track 
or across- track) with a smoothed mean value, are assigned smoothed mean 
values according to the procedure of Step 3. Step 4 is iterated until 
all blocks have smoothed mean values. 

In Step 5 of the procedure, the smoothed block mean values are used 
as XSTAR haze diagnostics, and the multiplicative and additive correction 
factors appropriate for each block center are calculated from them. 
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Finally in Step 6 the multiplicative and additive correction factors 
calculated from the block means in Step 5 are interpolated between block 
centers (in two dimensions) to determine the appropriate correction 
factor for each pixel. These correction factors are then applied pixel 
by pixel. For this step a curvilinear interpolation is used which 
employs an approximately Gaussian interpolation weighting function, 
described in Reference 35 . Pixels which are near the borders of the 
scene, so that only one or two block centers are within the .interpolating 
range (±4 lines and ±4 pixels) of the pixel, are corrected by interpolating 
the correction factors calculated for only those blocks whose centers are 
within the interpolating range. 

The spatially varying XSTAR procedure results in an effective atmos- 
pheric haze correction (for Landsat agricultural MSS data) with a 15 pixel 
(vl.2 x 0.9 km) spatial resolution of haze variability. The performance 
characteristics of this haze correction are discussed in Section 3.2.4. 


43 



2P 


3 

TEST AND EVALUATION OP PROCEDURE M CONFIGURED FOR SPRING WHEAT 

Procedure M utilizes a statistical sampling strategy to inventory 
crop acreage. The procedure is constructed in a modular way and was 
designed to function within the LACIE framework. This section presents 
an evaluation of a spring wheat configuration of Procedure M in estimating 
spring small grain and spring wheat acreages in the Northern Great Plains. 
The evaluation considers the performance of the overall procedure in 
providing acreage estimates (Section 3.1), as well as the performance 
of the individual components (modules) that comprise the procedure 
(Section 3.2). 

3.1 TEST AND EVALUATION OF SYSTEM PERFORMANCE 

Test and evaluation of Procedure M performance is presented in 
three parts — the experiment design, results, and a summary. 

3.1.1 EXPERIMENT DESIGN 

The major objective of the evaluation was to gain an understanding, 
in a statistical sense, of the overall performance of Procedure M as 
configured for spring wheat. That is, the experiment was to characterize 
the procedure's performance in terms of bias and variance of crop pro- 
portion estimates. In addition to spring wheat and other spring small 
grain estimates, the accuracy of the total spring small grain estimates 
were to be evaluated . 

The specific software configuration employed to conduct this evalua- 
tion is described in Appendices B and C. Experiment parameters are listed 
in Table 3.1. Note that both Phase 2 and Phase 3 segments were used. 

This was done to evaluate whether the machine labeling procedure for 
spring wheat, described in Section 2 and developed using Phase 3 sites, 
could be extended to another crop year, represented by the Phase 3 sites. 





TABLE 3.1 EXPERIMENT PARAMETERS 

•’ 26 Northern Great Plains Phase 2 and Phase 3 LACIE Blind Sites 

• Up to 7 Acquisitions for Each Site 

- 3 or 4 Chosen for Field and Strata Definition 

- 3 or 4 Chosen for Automatic Labeling 

• 4 Stratification Cases (B Clusters — 1,20,40,60) 

• 5 Field Sample Cases (Quasi-Fields or Blobs — 40,60,80,100,120) 

• 50 Estimates Per Case (Random Field Sample Replicates) 

• 26,000 Proportion Estimates Total 
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The 26 segments chosen for evaluation were selected so that they 
would geographically represent the major spring wheat growing regions 
in the United States Northern Great Plains (Figure 3.1). The propor- 
tions of segments planted to spring small grains vary from about 70% 
to near 9%, as is illustrated in Figure 3.2 in which segments are ranked 
according to their spring small grain proportions.' The specific acqui- 
sition dates available and used for these segments are listed in. 

Appendix D. 

Procedure M for spring wheat utilizes 100 labeled samples drawn 
from 40 spectral strata. This experiment sought to characterize the 
perf ordinance of other sampling strategies as well. Unstratified sampling 
was carried out in addition to sampling from within 20, 40, and 60 strata 
to enable measuring the variance reduction (R factor) due to stratified 
sampling. The R factor is a measure of the efficiency of stratified 
sampling and is defined as: 


2 

cr 

R = -~ 
urn 2 

°ln 

where 

m is the number of strata 

n is the number of fields sampled 
2 

a is the measured variance of the procedure 

(over the 50 random field sample replicates) 

2 

a In is the measured variance for the one-stratum 
case (i.e., unstratified) 


Once samples are drawn, the quasi-fields are labeled. To evaluate 
the efficiency of the procedure in terms of its variance characteristics 
juxtaposed to the gains that might be achieved in the efficiency of 
labeling, sets of 40, 60, 80, 100 and 120 samples were drawn and results 
using them were compared. 
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FIGURE 3.2 AMOUNT OF . SPRING SMALL GRAIN PER SEGMENT 
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The spring wheat configuration of Procedure M uses the previously 
described two-step labeling mechanism. The first step involves analyst 
interpretation and labeling of samples as ' Spring' S'malT grain' 1 or 'Other*. 

Ju 

Ground truth" was used as a substitute for analyst interpreter labels 
in this experiment. The second step involves a further discrimination 
of the grain samples by the machine labeler. These labels are then 
utilized in aggregating a segment spring wheat proportion estimate. 

To characterize both bias and variance characteristics of the pro- 
cedure, estimates were made by replicating the process of drawing 50 sets 
of samples for each combination of stratification and field sampling cases. 

3.1.2 SYSTEM PERFORMANCE RESULTS 

The results presented in this section consider both portions of the. 
two-stage labeling mechanism. First, three aspects of the performance 
of the procedure in estimating total spring small grains are presented: 

(1) Average performance using 40 strata and 100 labeled samples 

(2) Bias due to ignoring certain strata 

(3) Parametric evaluation of sampling variance 

This analysis evaluates the performan'ce of the two-class procedure with 
respect to ground truth labels. 

Then, three aspects of the performance of the procedure in estimating 
three classes — spring wheat, other spring small grains, and other — are 
presented: 

(1) Average performance using 40 strata and 100 labeled samples 

(2) Performance for various partitions of the segments 

(3) Parametric evaluation of sampling variance 


Wall-to-wall ground truth provided by JSC and prepared in subpixel 
format by Lockheed Electronics Company and ERIM personnel was used . 
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This three-class analysis evaluates the performance of the procedure 
with the machine labeler. The intent is to show not only areas of 
strength but also to make recommendations to improve the spring wheat 
labeling accuracy and to evaluate whether the labeling strategy employed 
can meet needs in an operational setting. 


3. 1.2.1 Spring Small Grain Estimates (Two-Class) 

Performance Using 40 Strata and 100 Labeled Samples . Overall , the 
small grain proportion in the 26 LACIE blind sites was 32.3%. The aver- 
age estimate made using Procedure M was 34.3% — an absolute error of 1.9% 
and a relative error of 6%. This estimate is the average of 50 repli- 
cates of 100 labeled samples from 40 strata in each of the segments. 

There was no statistically significant difference between this estimate 
and those derived similarly from other combinations of strata and sample 
size, although their variance reductions (sampling efficiencies) were 
significantly different. 

Figure 3.3 illustrates the accuracy of these average estimates on 

an individual segment basis. The average standard deviation about each 

of these points was 2.5%, while their RMS error about the 45° line was 3.66 
2 

and the R about the regression line was 0.987. Overall, the estimates were 
accurate with a slight positive bias introduced as the percentage of small 
grain in the segment increased. 

Bias Due to Ignoring Certain Strata . Procedure M samples only 
quasi-fields with interior pixels . Assuming that the estimate for these 
fields is unbiased, the expected bias due to not sampling from the smaller 
fields is given by the expression: 


b = 


N - M 
N 


(P - P ) 

s u 


E-T 

T 


where E is the estimate and T the true grain proportion. 
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FIGURE 3.3 PROCEDURE M SEGMENT ESTIMATES OF TOTAL SPRING SMALL GRAINS 


52 



2p 

where 

P g is the crop proportion in the quasi-fields from 
which samples are drawn 

P^ is the crop proportion in the quasi-fields from 
which no samples were drawn 

N is the total number of pixels 

M is the total number of pixels in the sampled strata 

If p s = p u or if M ^ N, no significant bias is introduced. The 
expected bias in the 26 sample segments was computed to be 1.9% which 
is about equal to the absolute error that was measured. Hence Procedure M 
was found to be virtually unbiased with respect to the fields that were sampled. 

Figure 3.4 provides a comparison on a segment-by-segment basis 
of the expected -bias (dashed line) and the measured bias. The segments 
are ordered as in Figure 3.2. With the exceptions of Segments 1652 
and 1662, the estimates and expectations track closely. The unexpected 
bias encountered in Segment 1652 was the result of inadequate ground 
truth labels (See footnote in Section 3.1.2). 

The bias encountered is well understood to be a function predomi- 
nantly of ignoring the small-field strata. Three techniques are under 
consideration to eliminate this bias. One strategy is to increase the 
number of pixels in quasi-fields . having interior pixels. This can be 
accomplished by relaxing the parameter settings in the BLOB program. 

A second strategy involves a post bias correction algorithm based on a 
relationship that may exist between average field size and grain propor- 
tions. The third strategy, suited for areas dominated by smaller fields, 

is to sample the small-field stratum directly. What is implied is an 

f 

i 

*Currently these parameters are fixed for all segments (See Appendix B) . 

In the Northern Great Plains results, an average of about 70% of each 
segment was represented by quasi-fields with interior pixels. 
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initial stratification of an area into two strata, small fields and 
large fields and the employment of sampling strategies suited to each 
stratum. 

Parametric Evaluation of Sampling Variance . The efficiency of 
Procedure M in terms of its variance reduction characteristics is as 
critical to its usefulness as is its bias characteristics. In this 
section the variance of the spring small grain estimate derived using 
Procedure M is discussed. Integral to this procedure is the concept 
of stratified sampling. It has been shown that a reduction in variance 
can be achieved using a stratified sampling approach [30]. Here, 
empirical evidence will be provided to illustrate the degree of sam- 
pling efficiency that can be achieved using Procedure M. 

Results of the empirical tests are summarized in Figure 3.5. This 
plot illustrates the dependence of the measured variance (represented 
by its square root) on the number of labeled samples and the number of 
strata. The standard deviation, 

a 

mn 

is plotted versus the number of labeled samples, on a semi-log graph 
for eacA~ of the four strata parameters ,. where as before, 

m = 1, 20, 40, or 60 denotes the number of strata 

n — 40, 60, 80, 100, or 120 denotes the number of 
labeled samples 

a . denotes the standard deviation of 50 observations 
mm 

from Segment i using n labeled samples and m strata. 
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The top curve, for one stratum, represents unstratified sampling. 
Note that for 20, 40, and 60 strata, the variance encountered is sub- 
stantially reduced. The bar labeled '5% Significance Interval' has 
length 


log /F (1—. 05 , 127 4,1274) = log ,/lT0967. = logCl. 04725) 

Any two standard deviation estimates whose vertical distance on the 
graph exceeded the length of the bar are significantly different by the 
F-test at the 5% level. A dashed line is drawn below the curve for 40 
strata at a distance equal to the length of the bar . This shows 
geometrically that the curve for 60 strata is not significantly lower 
than the one for 40 strata, except for a sample size of 60. 

The choice of using 40 , strata in Procedure M for spring wheat 
seems to be supported by this analysis. The choice of using 100 labeled 
samples is not as good as using 120 labeled samples in the sense that 
°40 120 ^ s i8 n ifi cailt ly smaller than a ^ at the 5% level. On the 

other hand, the additional cost of 20 more labels may not be warranted 
by the reduction of variance gained by using 120 samples, since the 
standard deviation is already well below 3% at the segment level. 

The R (variance reduction) factor presented in Section 3.1.1 also 
provides insight into the efficiency of stratified sampling. For 
example, a procedure with R = 0.344 would need only 34.4 % of the 
labeling effort required to obtain the same variance using unstratified 
sampling. Table 3.2 provides a matrix of R factors measured in con- 
ducting this evaluation of Procedure M. Since R is a ratio of sampled 
variances, it has an F distribution. Any two variances with 

0.9118 < R < 1.0967 (F(. 05, 1274, 1274) to F(. 95,1274,1274) 

are not significantly different. 

Recall, average estimates were not significantly different, regard- 
less of strata or sample settings. 
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TABLE 3.2 R FACTOR MATRIX (Reduction of Variance) 


a a 



•fif •4r 

Relative to Unstratified Relative to 40 Strata 


# Strata \ # Sa?ipleS 

40 

60 

80 

100 

120 

40 

"60 

80 

100 

120 

1 

1.0 • 

1.0 

1.0 

1.0 

1.0 

2.25 

2.43 

2.65 

2.91 

2.76 

20 

0.49 

0.51 

0.53 

0.52 

0.52 

1.11 

1.24 

1.41 

1.52 

1.44 

40 

0.44 

0.41 

0.38 

0.34 

0.36 

1.0 

1.0 

1.0 

1.0 

1.0 

60 

0.44 

0.35 

0.37 

0.33 

0.35 

0.87 

0.86 

0.98 

0.97 

0.95 


Ui 

oo 


A 

R factors are defined as follows: 


R' = 



R = 


m,n 


40, n 


A A 

If 0.9118 < R < 1.0967, then variances resulting from parameter settings 
are not significantly different. 
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3 . 1 . 2. 2 Spring Wheat, Other Spring Small Grains Estimates 
(Three Class) 

Performance Using 40 Strata and 100 Labeled Samples . The second 
stage of the Procedure M labeling .process labels the spring small grain 
quasi-fields .proportionally among spring wheat and other spring small 
grain, except when the proper acquisition is missing. Of the 26 seg- 
ments processed, three had acquisition histories inadequate for sepa- 
rating the small grains (See last entry in Table 2.9). Hence, spring 
wheat estimates were made for 23 segments. The overall results achieved 
for these segments appear in Table 3.3. Spring wheat was underestimated 
by 2.6%, and other spring small grains were overestimated by 3.8%. 

Figures 3.6 and 3.7 illustrate the average estimates on a segment-by- 
segment basis. The average standard deviation for a segment was under 
2%. The spring wheat estimates made for other combinations of strata 
and sample sizes were not significantly different from the estimates 
made using 40 strata and 100 samples. 

Though these results are encouraging, they do not compare in terms 
of accuracy with those achieved in making a total spring small grain 
estimate. The RMS error in these average spring wheat estimates was 
8,6% as opposed to 3.7% for all spring grains. The accuracy of the 
three-class estimates is closely tied to labeler performance. While 
an in-depth evaluation of the labeler will be presented in Section 3.2.1, 
a systematic pattern to the measured error does appear in these results 
and will be discussed in this section. 

Performance for Various Partitions of the Segments . It is of 
interest to determine whether the error measured in the spring wheat 
estimates is systematic in nature as opposed to random. • If systematic, 
techniques to improve performance can be explored within a procedural 
context. In order to evaluate this possibility, the 23 segments 
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• TABLE' 3 . 3 
AND 


SPRING SMALL GRAIN ESTIMATES USING 40 STRATA 
100 LABELED SAMPLES (23 Segments With 
50 Replicates Per Segment) 



Estimate 

(%) 

True 
(%) ’ 

H . 
1 

E-T 

T 

Spring Wheat 

15.4 

18.0 

-2.6 

-0.15 

Other Spring 

Small Grains 

15.5 

11.6 

3.9 

0.33 
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FIGURE 3.6 PROCEDURE M SEGMENT ESTIMATES OF SPRING WHEAT 
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FIGURE 3.7 PROCEDURE M SEGMENT ESTIMATES OE OTHER SPRING SMALL GRAINS 
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processed were partitioned into groupings as follows (Table D.3 in 
Appendix D identifies the segments in each category) : 


No. of 

Label Segments 

Acceptable 23 

Developmental 4 

Problem Segments 4 

Phase 2 5 

Phase 3 18 

Red River 8 


Description 

Spring wheat estimates made 

Phase 3 sites used for labeler 
development 

Displayed poorest results 

1976 Blind Sites 

1977 Blind Sites 
Developmental geographic region 


Figure 3.8 illustrates, using scatter diagrams, the performance 
attained within partitions in estimating spring wheat acreage. What 
is immediately evident is that two of the partitions. Problem Segments 
and Phase 2, contribute much of the error in an RMS sense. For both 
developmental segments and those located near the same river valley, 
accurate spring estimates are achieved. Section 3.2.1 discusses 
meteorological conditions that may influence these results in patterns 
that at first seem geographic or annual in nature. 

Results are displayed numerically for these partitions in Tables 
3.4 through 3.7. Note in Table 3.7 that the relative spring wheat 
error in Phase 3 sites is —1.8% as opposed to -32.7% in Phase 2 
sites. It is significant that spring wheat is largely underestimated. 
Four segments labeled Problem Segments exhibit exceptionally poor 
spring wheat estimates (Table 3.6) with measured relative error of 
-65.6%. On the other hand, the remaining nineteen segments, exhibiting 
a relative error of only 3%, estimated a 16.7% spring wheat proportion 
given 16.2% actual. This estimate was not significantly different from 
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TABLE 3.4 


STRATIFIED THREE-CLASS PERFORMANCE RESULTS, Part 1 



Spring Wheat Other Spring Small Grains 


S tratum 

No . of 
Segments 

Estimate 

(%) 

True 

(%) 

E-T 

(%) 

E-T 

T 

Estimate 

(%) 

True 

(%) 

E-T 

(%) 

E-T 

T 

Acceptable 

23 

15.37 

17.98 

-2.60 

-0.145 

15.46 

11.62 

3.84 

0.330 

Phase 2 

Segments • 

5 

22.83 

33.92 

-11.09 

-0.327 

24.81 

8.99 

15.82 

1.760 

Phase 3 
Segments 

18 

13.30 • 

13.55 

0.25 

-0.018 

12- 86 

12.35 

0.51 

0.041 


O'* 

Ln 



TABLE 3.5 STRATIFIED THREE-CLASS PERFORMANCE RESULTS, Part 2 



Spring Wheat Other Spring Small Grains 


Stratum 

No . of 
Segments 

Estimate 

(%) 

True 

(%) 

E-T 

(%) 

E-T 

T 

Estimate 

<%) 

True 

(%) 

E-T 

(%) 

E t T 

T 

(A) Acceptable 

23 

15.37 

17.98 

-2.60 

-0.145 

15.46 

11.62 

3.84 

0.248 

(B) Developmental 

A 

32.30 

27.30 

5.00 

0.183 

17.05 

19.76 

-2.71 

-0.137 

A Less B 

19 

11.81 

16.01 

-4.20 

-0.262 

15.12 

9.90 

5.22 

0.527 



TABLE 3.6 


STRATIFIED THREE-CLASS PERFORMANCE RESULTS, Part 3 



Spring Wheat Other Spring Small Grains 



Stratum 

Acceptable 

No . of 
S egment s 

Estimate 

(%) 

True 

<%) 

E-T 

(%) 

E-T 

T 

Estimate 

(%) 

True 

(%) 

E-T 

(%) 

E-T 

T 

(A) 

23 

15.37 

17.93 

-2.60 

-0.145 

15.46 

11.62 

3.-84 

0.330 

(B) 

Problem Segments 4 

9.05 

26.30 

-17.25 

-0.656 

25.26 

6.38 

'18.88 

2.959 


A Less B 

19 

16.70 

16.22 

0.48 

0.030 

13.39 

12.72, 

0.67 

0.053 


A Less B and 
Developmental 

15 • 

12.55 

13.27 

-0.72 

-0.054 

12.42 

10.85 

1.57 

0.145 



TABLE 3.7 


STRATIFIED THREE-CLASS PERFORMANCE RESULTS, Part 4 



Spring Wheat Other Spring Small Grains 



1 Stratum 

No. of 
Segments 

Estimate 

(%> 

True 

(%) 

E-T 

(%) 

E-T 

T 

Estimate 

(%) 

True 

(%) 

E-T 

(%) 

E-T 

! T 


(A) Acceptable 

23 

15.37 

17.98 

-2.60 

-0.145 

15.46 

11.62 

3.84 

O'. 330 


(B) Red River. • 










00 

Valley 

8 

24.74 

20.29 

4.45 

0.219 

14.52 

17.37 

-2.86 

-0il97 


A Less B 

15 

10.38 

16.74 

-6.36 

-0.380 

15.96 

8.55 

7.41 

0.867 
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the truth at the 0.05 significance level. Table 3.7 indicates that .more 
accurate estimates were made in the vicinity of the developmental 
segments 'than otherwise. 


Parametric Evaluation of Sampling Variance . The sampling variances 
measured for spring wheat estimates are parametrically illustrated in 
Figure 3.9. Again, these represent .the aggregated performance with 
50 replicates over the 23 segments. A 5% significance bar is provided, 
as in Figure 3.5. Recall that since this graph is on a semi-log scale, 
this bar can be displaced to, any position on the graph and will encompass 
curves for sampling parameters whose procedural variances are not signifi- 
cantly different. Once again, for a fixed number of labeled samples, the 
use of 40 or 60 strata results in variances are not significantly different 
at the 0.05 level. The variance reduction (R factor) relative to unstrati- 
fied sampling using 40 strata' and 100 labeled samples was 0.48. 

The variances illustrated by the parametric curves in Figure 3.9 
are attributable to the procedure's sampling strategy and estimation 
technique. It is of interest to examine how these variances compare to 
those contributed by other components such as the labeler. The two 
dashed lines appearing at 5.4% and 9.2% illustrate the RMS between-segment 
error for the spring wheat estimate of the developmental and other seg- 
ments respectively. This clearly displays that the sampling efficiency 
of the procedure is well within the limits of accuracy provided by the 
labeling mechanism. In other words, the Procedure M framework is 
accurate and efficient with respect to the labeling source. 

Table 3.8 is provided for completeness and contains summary sta- 
tistics on a segment basis of the Procedure M evaluation using 40 strata 
and 100 labeled samples. 
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TABLE 3.8 SEGMENT PERFORMANCE RESULTS 



40 Strata 100 Labeled Samples 50 Replications 


3-class Procedure M 



2-class 

(Tot. Spring Sm. Grains) ’ 

Raw 

Spring Wheat 

Raw Other 

Spring Sm. 

Grains 

Unknown Spring Sm. Grains 


Mean 

Sampling Expected 


Mean 

Std. Dev 


Mean 

Std, Dev. 


Mean 

Std. Dev. 

Segment 

Estimate 

Std. Dev. 

Bias 

Truth 

Estimate of Es t. 

Truth 

Estimate 

of Est. 

TrutJi 

Estimate 

of Est. Comment 

1104 

3 . 9 % 

1 . 77 . 

- 0 . 27 . 

4 . 67 . 

0 7 . 

0 7 c 

0 . 07 c 

3 . 97 c 

1 . 77 o 

4 . 67 , 

0 7 . 

o % 

1498 

29.5 

3.0 

1.4 

28.9 

13.3 

2.3 

8.6 

. 16.1 

2.7 

20.3 

0.1 

0,2 

1512 

27.9 

1.8 

- 0.0 

29.8 

15.6 

2.4 

10.4 

7.4 

1.7 

19.5 

4 . p 

1.9 

1513 

78.7 

2.0 

6.0 

71.6 

0.4 

0.1 

52.3 

0.4 

0.2 

19.3 

<70 

2.0 Rejected 

1515 

61.9 

2.9 

2.6 

60.6 

30.9 

3.6 

36.8 

18.1 

2.8 

23.8 

1 TT 9 

2.8 

1520 

20.6 

3.5 

1.7 

18.4 

10.7 

2.5 

10.4 

5.3 

1.6 

8.0 

■ 4.6 

1.6 

1602 

34.8 

3.9 

- 0.1 

35.7 

22.9 

2.8 

31.1 

7.0 

1.5 

4 . 6 

4 . 9 

2.3 

1606 

36.6 

2.9 

3.6 

31.9 

7.6 

2.4 

24.5 

3.6 

1.5 

7.5 

( 25 . 4 ) 

3 . 2 Rej ected 

1614 

41.5 

3.2 

2.8 

38.5 

16.5 

2.5 

25.1 

20.4 

3,0 

13.5 

4 . 6 

1.7 

1633 

43.8 

2.2 

3.2 

39.7 

20.3 

3.1 

37.2 

11.8 

1.9 

2.5 

11.7 

3.0 

1637 

38.8 

2.6 

2.5 

35.0 

9.7 

1.9 

31.9 

28.4 

2.6 

3.2 

0.7 

0.7 

'1640 

52.5 

3.2 

4.4 

48.6 

35.9 

3.2 

31.5 

15.5 

2.1 

17.1 

1.1 

0.8 

1642 

61.4 

3.1 

5.7 

53.5 

28.3 

3.5 

3 S .7 

17.7 

■ 2.6 

14 . 8 ' 

15.4 

3.3 

1652 

32.6 

2.7 

0.5 

36.5 

4.2 

1.7 

24.7 

22.5 

2.5 

11.8 

5.9 

1.6 

1662 

52.7 

2.9 

5.4 

44.9 

20.2 

2.6 

36.8 

32.2 

2.3 

8.1 

0.3 

0.4 . 

1663 

53.4 

1.6 

3.5 

50.2 

37.0 

2.6 

32.3 

12.2 

2.0 

17.9 

4.2 

1.6 

1669 

7.4 

1.9 

- 0.9 

9.5 

0.3 

0.3 

5.9 

7.0 

1.9 

3.8 

0.1 

0.3 

1681 

37.1 

2.7 

1.7 

34.9 

16. 4 

2.4 

15.6 

20.2 

2.3 

19.2 

0.5 

0.6 

1699 

20.3 

1.9 

0.1 

20.1 

0.9 

0.7 

7.1 

18.0 

1.9 

13.0 

1.4 

1.1 

1800 

28.4 

, 2.8 

1.6 

26.8 

1.9 

0.9 

0.5 

26.3 

2.8 

26.3 

0.2 

0.3 

1803 

0.3 

0.4 

0.2 

0.5 

0.2 

0.2 

0.0 

0.1 

0,3 

0.5 

0 

0 

1805 

13.1 

2.6 

- 0.9 

14.7 

3.1 

1.2 

0.3 

9.8 

1.3 

14.4 

0.2 

0.4 

1811 

2.7 

1.4 

- 0.2 

2.5 

2.2 

1.4 

0.1 

0.5 

0,4 

2.4 , 


0 

1899 

65.3 

3.4 

5.6 

58.9 

7.3 

1.6 

28.6 

13.4 

2.5 

30.3 

< M ~ 6 ) 

3.8 Rejected 

1913 

13.0 

2.5 

- 0.8 

14.3 

0.8 

0,8 

11.8 

11.6 

2.4 

2.5 

076 

1.0 

1927 

31.8 

2.9 

1.6 

29.9 

11.5 

2.4 

16.7 

15.7 

2.3 

13.3 

. 4.6 

1.8 





3.1.3 SUMMARY OF SYSTEM PERFORMANCE 

Procedure M configured for spring wheat has been parametrically 
evaluated using 26 LACIE Phase 2 and Phase 3 Blind Sites distributed 
across the Northern Great Plains. Encouraging results were achieved 
in estimating total small spring grain and spring wheat proportions. 
Analysis of results showed that: 

1) Procedure M provided accurate total spring small grains 
proportion estimates with respect to the source of labels. 

2) High variance spring wheat estimates were made at the 
segment level. 

3) Poor spring wheat results in certain segments were 
seemingly systematic in nature and probably related to 
ancillary conditions; four segments exhibited poorest results; 
the aggregated estimate based on the remaining 19 of 23 
segments for which spring wheat estimates were made exhibited 
an absolute error of 0.5% and a relative error of only 3%. 
Implying that the spring wheat discriminant function was 
accurate when employed within the appropriate stratum (excluding 
Phase II and moisture stressed segments — see Section 3.2.1) 

4) Within-segment sampling variance is not a key issue with 
this procedure; accurate labeling of samples is critical. 

This overall evaluation does point to the need for a critical 
analysis of the component parts of Procedure M, in particular the spring 
wheat labeling mechanism. Section 2.2 provides evaluations of these 
components. The overall mechanism and procedural concept is sound, 
exhibiting both accurate and efficient estimates of crops. Improvement 
in certain components may result in levels of accuracy for spring wheat 
estimates that were not , expected, given the problem's degree of diffi- 
culty. 
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3,2 TEST AND EVALUATION OF COMPONENT PERFORMANCE 

In addition to evaluating the overall performance of Procedure M 
for spring wheat, the tests and analyses were conducted so that the per- 
formance of individual system components could be evaluated. Evaluations 
of four major components — machine labeler, stratification, spatial 
feature definition, and haze correction — are presented below. 


3.2.1 EVALUATION OF LABELER PERFORMANCE 

& 

For purposes of machine labeler evaluation, 18 segments were used. 
This subset excludes the four segments used for development of the 
machine labeling criterion. An additional six segments had little or 
no spring wheat present. Since, as will be explained later, spring 
wheat labeling accuracy served as the primary indicator of labeling 
success, these six segments offered little or no additional information 
for evaluation, and were therefore excluded. The locations of the 
18 remaining segments are shown in Figure 3.10. 

3. 2. 1.1 Results 

Overall accuracy of labeling for spring wheat and barley pixels 
was 53%, with 81% of the barley pixels receiving correct labels com- 
pared to 45% of the spring wheat pixels. Table 3.9 summarizes the 
overall results by crop and year. (See Appendix E for a segment-by- 
segment breakdown of results for the entire test set.) 

Figure 3.11, which is a plot of overall accuracy by segment 
(ordered according to decreasing small grains percentage) , illustrates 
the wide variability of results. However, a sub-grouping of the seg- 
ments is also suggested by the graph. By dividing the 18 segments into 
two groups based on labeling accuracy (greater than 50%, or less than 
50%), a clearer understanding of labeler performance can be gained. 

Table 3.10 summarizes the labeling accuracy for these two groups, 
again by crop and year. 

A 

16 were drawn from the 26 described in Section 3.1. Two additional 
segments were available for this evaluation but were not available for 
full scale testing due to incomplete wall-to-wall ground truth. 
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*^* 1 Developmental Sites 


FIGURE 3.10 


LOCATION OT SITES USED IN LABELER PERFORMANCE ANALYSIS 



2 ™ 


Spring Wheat 
Barley 

Overall 


TABLE 3.9 LABELING ACCURACY FOR SPRING WHEAT 
AND BARLEY PIXELS 



Phase 2 

Phase 3 

Total 

Spring Wheat 

43% 

51% 

46% 

Barley 

72% 

86% 

81% 

Overall 

46% 

60% 

53% <■ 


TABLE 3.10 LABELING ACCURACY FOR TWO GROUPS OF SEGMENTS 


"Good" Segments (> 50% Accuracy) "Bad" Segments (< 50% Accurac 

3 Phase 2- Sites % Phase 2 Sites 

3 Phase 3 Sites 5 Phase 3 Sites 


Phase 2 

Phase 3 

Total 

Phase 2 

Phase 3 

Total 

58% 

71% 

65% 

33% 

7% 

25% 

71% 

80% 

77% 

74% 

99% 

91% 



• 68% 

— 

— 

34% 
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FIGURE 3.11 LABELING ACCURACY FOR SPRING WHEAT 
AND BARLEY PIXELS, BY SEGMENT 
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Clearly, the factor that drives the overall accuracy down is the 
reduced labeling accuracy for spring wheat pixels. In fact, barley 
labeling accuracy actually increased from the greater-than-50% (good) 
segments to the less-than-50% (bad) segments. 

Although the labeling criterion was developed using spring wheat 
and barley pixels only, the procedure is meant to label all the small 
grains (spring wheat, barley, oats, rye, and triticale) . For the 28 seg- 
ments which comprised the entire data set, there were no triticale pixels 
and only a few rye pixels. There were, however, enough oats pixels to 
provide an indication of the procedure's success in separating them from 
spring wheat pixels. Table 3.11 summarizes the results for oats. 

3. 2. 1.2 Evaluation of Results 

As described earlier, the decision criterion for labeling is based 
on a distance in Brightness-Greenness space which increases from the 
time of heading through the dough stage of development. Since barley 
fields have been observed to ripen somewhat faster than spring wheat 
fields, a greater distance on any given day in the critical day range 
(defined in Section 2.2.1) should indicate a barley pixel. 

Consider, however, the result of a localized increase in the rate 
of crop development. In this situation, distances on a given day should 
tend to be greater than they would be under normal conditions. One should 
find, then, that both crops tend to be a greater distance from the ref- 
erence line than that selected as the spring wheat/barley discrimination 
value. Thus spring wheat pixels would be mistaken for barley pixels, 
while barley pixels would be even more likely to be correctly labeled. 

This is precisely the result we see in Table 3.10, suggesting that the 
poor results were indeed caused by an increase in the crop develpment rate. 

While such an increase could be the result of a number of factors, 
one of the most likely candidates is environmental stress, and particularly 
moisture stress. This stress could take the form of perennially low 
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TABLE 3.11 LABELER RESULTS FOR OATS PIXELS 


Label Assigned 

Spring Wheat Other Spring Small Grains 


Overall 

40% 

60% 

Good Segments 

56% 

44% 

Phase 2 

60% 

40% 

Phase 3 

'51% 

49% 

Bad Segments 

18% 

• 82% 

Phase 2 

23% 

77% 

Phase 3 

2% 

98% 
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moisture (i.e. , arid regions) or exceptionally low moisture (i.e., drought). 
In either case, however, the effect should be felt over a larger region 
than a single segment. If moisture stress is indeed the cause of the 
change in development rate, then one would expect to see a geographical 
clustering or delineation between good and bad segments. 

Although such a separation is not apparent in Phase 2 (perhaps only 
because of the low number and limited distribution of Phase 2 segments 
in the test set) , the separation is readily apparent for the Phase 3 seg- 
ments, as shown in Figure 3.12. Segments located in the southwestern 
or western portions of the region, where moisture stress is more likely, 
yielded poorer labeler results than those in the less arid portions of 
the region. The same geographical trend is evident in the labeling 
results for oats. In the northeastern portion of the region (N. Dakota 
and Minnesota), oats tended to fall into, the spring wheat class, while 
in the southern and western portions (S. Dakota and Montana) , oats tended 
to fall into the other spring small grains class. Finally, LACIE weather 
summaries reported moisture stress for the general region of the bad 
segments. Thus we conclude that moisture stress in part of the test 
region caused an increase in the rate of crop development, which in 
turn resulted in poor spring wheat labeling accuracy. 

Since moisture stress should influence the green development profile, 
it might be possible to detect such a condition, using the profile, and 
adjust the decision criterion accordingly. For example, the estimated 
peak greenness value should indicate the vigor of the field being observed. 
Similarly, the rate of decrease in, greenness from the peak value to the 
value in the critical day range should be an indicator of the rate of 
crop development, with a steeper slope indicating a more rapid develop- 
ment rate. Preliminary examination of these and other such indicators 
is in progress. 
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3.2.2 EVALUATION OF SPECTRAL STRATIFICATION 

The technique currently' employed to establish spectral strata from 
among the set of quasi-fields formed is called BCLUST. This algorithm 
is described in Section 2.2.3. Its evaluation is discussed in this 
section. 

BCLUST conducts an unsupervised, multiple-pass clustering of spec- 
tral means of quasi-field' interiors to produce a fixed number of spectral 
strata. These strata are formed to direct samples in a manner that, 
compared to unstratified ‘sampling, would reduce the variance of the 
estimate made. The success of this procedure is measured by 
its ability to group quasi-field means into strata that are all grain 
or all non-grain. Ideally only two strata are needed — the' group of 
grain clusters, and the group of other clusters. Such a procedure would 
require drawing only one sample to identify which stratum is which. This, 
of course, has not been realized. 

The variance reduction factor, R , was earlier defined and used 

mn 

in describing the efficiency gained in Procedure M due to stratified 
sampling. This R factor was determined using the measured variation of 
the procedure. The availability of wall-to-wall ground truth permits 
the use of a measure that is closely related to the variance reduction 
factor, called the expected variance reduction factor (R_ factor). 

The R factor can be used to evaluate the degree of separability real— 
izable at any level of stratification. This factor is used to evaluate 
the performance of BCLUST and of BLOB. 

The expected variance' reduction factor, R , is defined as follows: 


where 


r e = 


m 



pd - p) 


n. the number of pixels in stratum i 
i 

n the number of pixels in all strata 
m is the number of strata 

P. Is the true grain proportion in stratum i 
P is the true grain proportion in all strata 
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If the Rg factor is 0, then the strata are either pure grain or pure 

other. If R' = I the strata are no' purer than the segment as a whole. 

This latter case is equivalent to unstratified sampling; hence sampling 

variance is not improved by such stratification. The value R = 0.5 

£ 

is approximately equivalent to an 85% average purity among strata. 
Figure 3.13 illustrates the-R„ factor of the' four strata cases 

ilj 

employed in the evaluation. The use of 20 strata significantly reduces 

the R^ factor from unity, with further reductions with the employment 

of 40 and 60 strata, although the latter -two cases track closely. 

BCLUST strata purity is ultimately limited to the purity of the set 

of quasi-fields. • The associated R„ factors for BLOB are listed in 

Jl» 

Table 3.12 and plotted as the bottom line in Figure 3.13. In examining 
the BLOB R factor, keep in mind that it is not a linear measure of 

1L 

purity as a function of the grain proportion, p. As p approaches zero, 
R E will increase disproportionally due to the defining ratio expression. 
Hence the Rg factors of the segments at the right hand side of Figure 
3.13 are somewhat inflated. The R E factor of Segment 1652 is large 
for a different reason that is further discussed in Section 3.1.2. 

Figure 3.13 illustrates, as well, that unless limited by available 
crop separability, additional reduction of this factor can be realized. 
The employment and evaluation of other stratification strategies is 
suggested, at least as a basis of comparison. 

3.2.3 EVALUATION OF SPATIAL FEATURE DEFINITION 

An integral component of Procedure M is the definition of spatial 
features that we call quasi-fields. Quasi-fields are used as labeling 
and sampling targets. In addition, scene stratification, described in 
Section 2. 2;, 3- is based on an unsupervised clustering of the spectral 
means of quasi-fields. Procedure M for spring wheat employs the BLOB 
algorithm, described in Section 2.2.4, to structure quasi-fields. A 
discussion and evaluation of the performance of BLOB in Procedure M is 
presented in this section. 
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TABLE 3.12 BLOB PERFORMANCE STATISTICS 


Quasi-Fields 


Segment 

Number 

Number 

With 

Interiors 

% of 

Segment • 
Covered 

Grain 

Propor- 

tion 

Rg Factor 

Int . 
Pixels 

Purity 

Int. 

Pixels 

Exp. 

Bias 

1513 

1195 

378 

78.9 

71.60 

0.022 

99.4 

5.99 

1515 

1074 

474 

80.0 

60.58 

0.064 

97.6 

2.61 

1899 

1756 

415 

69.8 

58.87 

0.043 

98.4 

5.57 

1642 

1181 

452 

81.2 

53.45 

0.155 

95.2 

5.73 

1663 

1760 

455 

66.5 

50.17 

0.026 

98.8 

3.48 

1640 

1437 

488 

74.7 

48.63 

0.114 

96.0 

4.35 

1662 

1021 

451 

83.7 

44.89 

0.161 

94.4 

4.39 

1633 

1163 

401 

79.9 

39.67 

0.076 

97.2 

3.21 

1614 

1307 

445 

71.9 

38.53 

0.253 

91.1 

2.76 

1652 

1209 

427 

76.5 

36.49 

0.353 

85.4 

0.51 

1602 

1820 

395 

60.2 

35.66 

0.188 

93.9 

-0.08 

1637 

1296 

474 

79.5 

35.04 

0.102 

96.4 

2.54 

1681 

1248 

477 

78.6 

34.87 

0.071 

97.2 

1.65 

1606 

1927 

455 

63.1 

31.89 

0.107 

96.1 

3.60 

1927 

993 

442 

84.0 

29.94 

0.099 

96.6 

1.61 

1512 

1319 

449 

72.2 

29.83 

0.120 

96.0 

-0.02 

1498 

1027 

470 

81.5 

28.90 

0.136 

95.3 

1.42 

1800 

1233 

510 

79.7 • 

26.76 

0.109 

96.4 

1.58 

1699 

697 

358 

89.5 

20.08 

0.082 

97.6 

0.13 

1520 

1466 

472 

70.8 

18.42 

0.137 

96.1 

1.72 

1805 

1241 

461 

78.6 

14.73 

0.333 

94.0 

-0.89 

1913 

979 

382 

77.6 

14.29 

0.206 

96.7 

-0.82 

1669 

414' 

251 

94.9 ' 

9.54 

0.442 

95.7 

-0.89 

1104 

605 

320 

89.0 

4.61 

0.101 

98.3 

-0.24 

1811 

1574 

478 

72.4 

2.46 

0.115 

98.7 

0.23 

1803 

866 

342 

86.2 

0.51 

0.389 

99.0 

-0.23 

Average 


429 

77.7% 



96.3% 
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A quasi-field is a set of spatially contiguous pixels that may or 
may not correspond to a real farm field. A quasi-field is comprised of 
interior pixels and edge pixels. An interior pixel is one whose four 
strong neighbors are in the same quasi-field. Procedure M samples from, 
and bases its estimate only on, the set of quasi-fields with interior 
pixels. Fields comprised of only edge pixels thus form a stratum that 
is not sampled. 

On the average 429 quasi-fields with interior pixels were formed 
by the BLOB algorithm in each of the 26 segments used in the procedural 
evaluation. This stratum covered 78% of each segment on the average, 
with 22% not being sampled. Table 3.12 presents information about the 
performance of BLOB on a segment-by-segment basis. Elaboration follows. 

The major criterion to be used in evaluating BLOB performance is 
whether the quasi-fields formed properly represent the contiguous areas 
of spring small grain and other classes. Since a subset of quasi-fields 
formed in a segment are used as labeling targets, it is important to 
evaluate how these structures are visually presented to the analyst 
interpreters, and to determine what advantages may arise in labeling . 
quasi-fields rather than dots (as in LACIE Procedure 1) . 

Figure 3.14 is a typical map of the interior of quasi-fields, produced 
in Montana Segment 1929. In a region where strip cropping is practiced, 
field structure is strikingly apparent and road boundaries are visible. 
Sixty of these quasi-fields have been outlined to illustrate a set of 
labeling targets that could be presented to an analyst. Nearly all of 
these quasi-fields are associated with real fields. Mixture pixels are 
not singled out as targets, though small fields containing only one 
interior pixel do appear. 

Several measures of BLOB performance have been utilized. One 
measure, the R^, factor, is related to the minimum variance that a sub- 
sequent sampling procedure can expect to achieve. The R^, factors for 
quasi-field interiors are listed in Table 3.12. The factor is related 


85 



00 

c\ 



FIGURE 3.14 MAP OF INTERIORS OF QUASI-FIELDS, SEGMENT 1929 





to the percent purity measure also listed in Table 3.12. Over all seg- 
ments, the interiors of quasi-fields formed were found to be 96.3% pure, 
that is, only 3.7% of the time were they found to be a mixture of spring 
small grain and other. Figure 3. 15. illustrates the purity factor for 
each segment. -Quasi— field edge pixels obviously were more mixed than 
interior pixels. Interiors, which would operationally be used as label- 
ing targets, were found to be composed of a single crop class of interest. 
This makes feasible the assignment of single class, rather than propor- 
tional, labels to fields. In addition it is conjectured that the elimi- 
nation of any need to label mixture pixels makes the task of labeling 
quasi-fields a feasible one, more so than that of dot labeling. 

3.2.4 EVALUATION OF ATMOSPHERIC HAZE CORRECTION 

During the development of the spatially varying XSTAR haze correction 
procedure, attention was focused on computational cost as well as haze 
correction performance as criteria for evaluating the algorithm. The 
computational cost was observed to be primarily a function of the block 
size used to quantize the moving window aspects of the procedure. This 
relation between computational cost and block size is illustrated in 
Figure 3.16. We had expected that, while the computational cost would 
increase as the block size was decreased, the performance should increase 
with decreasing window size until the moving window became too small to 
provide a statistically representative haze diagnostic for the procedure. 
What we observed with respect to performance, however, is shown in 
Figure 3.17. Although the figure does not indicate performance for 
windows smaller than 15 lines by 15 pixels (measured between half ampli- 
tude points) , in general we observed no evidence that the performance 


Segment 1652 is the only segment whose apparent interior purity 
was less than 90%. It was found that the ground truth used to label 
many of the strip fields in the segment did not distinguish between 
individual grain and other strips, whereas the BLOB algorithm often 
could separate the two. The evaluation programs assumed that any blob 
labeled 'Strip' was 50% grain and 50% other, resulting in an artificially 
lower' average purity. 
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Segment Number 


FIGURE 3.15 


QUASI-FIELD PURITY (%) 



PERCENTAGE .INCREASE IN COST 
RELATIVE TO GLOBAL XSTAR CORRECTION 



Block Size 
(Lines x Pixels) 


FIGURE 3.16 COST OF SPATIALLY VARYING XSTAR HAZE CORRECTION 
' AS FUNCTION OF BLOCK SIZE USED 


Percentage Reduction in RMS Error (Pixel by Pixel) 
In Removing Differences in Consecutive Day Data 
(Relative- to Global XSTAR Correction) 



EFFECTIVE DIMENSIONS OF LOW-PASS MOVING WINDOW FILTER 

(In Pixels) 


FIGURE 3.17 PERFORMANCE OF SPATIALLY 'VARYING XSTAR HAZE CORRECTION 
ON FOUR CONSECUTIVE-DAY DATA SETS WITH VARYING HAZE CONDITIONS 
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leveled off as the window size decreased below the 15 x 15 size. We 
decided that the 15 x 15 window size was the smallest practical size for 
the window, however, because a smaller window would have required a block 
size smaller than 5x5 for proper performance, and the computational cost 
began to increase dramatically for block sizes smaller than 5x5. 

Since the spatially varying haze correction procedure applies a dif- 
ferent correction to each pixel, performance has been evaluated by measuring 
the pixel by pixel differences between approximately equivalent scenes 
(consecutive day Landsat acquisitions) before and after correction. The 
alternative would have been to use ground reflectance measurements from 
representative scenes to establish an NEAp (Noise Equivalent Change in 
Reflectance) performance figure; however, present reflectance data are 
too sparse to provide a proper evaluation of the spatially varying nature 
of the algorithm. On the other hand, one inherent limitation in using 
pixel by pixel differences between haze corrected consecutive day scenes 
as a measure of preprocessing performance is that distortions introduced 
to both scenes in an equivalent way (e.g., due to misleading haze diag- 
nostics from non-vegetated portions of the scene) are not measured. How- 
ever, an approximate assessment of these distortions can be made by 
looking for areas of abnormal contrast in a corrected image. In general 
the corrected images have been found to exhibit only minor distortions 
of this sort, while the beneficial aspects of the correction are dra- 
matically apparent in images wherever non-uniform haze is present. Thus, 
the performance of the spatially varying haze correction has been sta- 
tistically measured by calculating the root-mean-squared Euclidean dis- 
tance between registered pixels in consecutive day Landsat data before 
and after correction. (This performance measure uses only pixels which 
have passed the screening procedure on both days of a consecutive day 
acquisition.) Some example results of this type are presented in 
Table 3.13 for three scenes in which significant spatial haze varia- 
tions were apparent on one or both. days. ' The RMS error figures in this 
table Include "error" contributions from some effects other than haze 
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TABLE 3.13 RMS ERROR IN REMOVING DIFFERENCES IN CONSECUTIVE-DAY 

DATA (IN LANDSAT COUNTS) 


Untrans- 



formed (UT) 

Global XSTAR 

Spatially Varying XSTAR 


RMS 

Error 

RMS 

Error 

Improvement 
Over UT 

RMS 

Error 

Improvement 
Over UT 

Segment #1619 
77175-6 

11.7" 

10.1* 

13.5% 

4 . 

9.l" 

21.7% 

Segment #1640 
77139-40 

13.7 

11.3 

17.7% 

9.8 

28.8% 

Segment #1927 
77193-4 

16.2 

11.5 

29.1%“ 

9.3 

42.8% 

A 

Non-atmospheric effects 

set a lower 

bound of 3 to 

6 counts 

on this 


RMS error figure, depending on the scene being processed. 
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variations. These other effects include bidirectional effects in crop 
appearance (due to an approximate 6° change in view angle from one day 
to the other), misregistrations of pixels between acquisitions (which 
were minimized, but could not be completely removed), and quantization 
effects (due to the digital nature of the data). These other effects 
set a lower bound of approximately 3 to 6 counts on the RMS error figure, 
depending’ on the scene being processed. For scenes with uniform haze con- 
ditions, the spatially varying XSTAR performance is equivalent to the 
global XSTAR performance which has been reported previously [33 ] . For 
scenes with non-uniform haze conditions, the spatially varying XSTAR 
performance is a significant improvement over that of the global pro- 
cedure, as indicated in Table 3.13. We estimate, based on tests over 
numerous consecutive day acquisitions [33 ] , that the XSTAR haze correc- 
tion approximately doubles the amount of data which is amenable to signa- 
ture extension or multisegment training applications. 
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4 

SUMMARY, CONCLUSIONS, AND RECOMMENDATIONS 


4.1 SUMMARY 

Procedure M, an objective multicrop area inventory procedure, has 
been defined. It is a modular system with state-of-the-art components 
which is readily modified or configured for different applications. In 
the LAC IE context. Procedure M’s major differences from LACIE Procedure 1 
are: (a) additional preprocessing to correct for atmospheric haze varia- 
tions and to perform other normalizations and transformations, (b) defini- 
tion and use of more strata, (c) definition, selection, and labeling of 
quasi-field interiors in the scene instead of labeling individual pixels, 
and (d) proportion estimation without maximum likelihood classification. 
Procedure M evolved from, and incorporates developments and understanding 
gained from, a series of supporting research and technology tasks that 
have been pursued at ERIM, as well as other organizations in JSC's SRT 
community. 

Procedure M was configured for spring wheat inventory by including 
a two-step labeling process. First, an analyst labels each sampled quasi- 
field as either 'Spring Small Grain' or 'Other'. Then a machine labeler 
refines the label of the 'Spring Small Gfain' samples, assigning either 
a proportional label between ' Spring' Wheat * and 'Other Spring Small Grain' 
or the label 'Unidentifiable Spring Small Grain’ . The machine labeler 
makes use of a temporal profile of the Greenness component of Landsat data 
to estimate crop calendar shifts and detects the less rapid maturation or 
brightening of wheat. 
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4.2 CONCLUSIONS 

Extensive tests of the spring wheat configuration of Procedure M 
were made using Landsat data from 26 LACIE blind test sites in North 
Dakota, South Dakota, Minnesota, and Montana — five from Phase 2 
(Summer 1976) and the remainder from Phase 3 (Summer 1977) . The tests 
were designed to assess both the bias and variance of the procedure's 
performance in estimating crop areas, by use of 50 replicates (estimates 
using different selections of quasi-fields for labeling) for each test 
case. In addition to testing the configuration's design of 40 spectral 
strata and 100 samples for labeling, 19 other combinations of strata 
and samples were tested. 

Accurate two-class (spring small grains vs. other) proportion esti- 
mates were achieved in tests using ground truth labels as a. substitute 
for analyst labels. Only a slight absolute- bias (<2%) was observed and 
this was found to be primarily due to not sampling those (small) quasi- 
fields which. are without interior pixels. -The use of 40 spectral strata 
and labeling of 100 quasi-fields provided low-variance proportion esti- 
mates (standard deviation, a = 2.5%). The average reduction of variance 
factor was 0.34 for the procedure. Use of more strata or more samples 
does not appear to be warranted. 

Encouraging but less accurate three-class (spring wheat, other 
spring small grains, and other) performance was found when results were 
aggregated over the 23 segments, for which proper acquisitions were 
available. The absolute bias was relatively low (-2.6% for spring wheat 
and +3.8% for other spring small grains) and the variance attributable 
to sampling was about the same as for two classes. On the other hand, 
segment-to-segment variance, largely attributable to labeling errors, 
was much larger (a = 9.2%). In analyzing these results, systematic 
spring wheat errors were noted for certain subsets of segments. The 
fixed decision rule employed by the machine labeler was developed using 
four Phase 3 sites. Spring wheat errors were greatest for Phase 3 sites 
to the far West and Southwest of the development sites and for two of the 
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five Phase 2 sites. Nineteen of the twenty- three segments were found 
to be in a stratum in which it was reasonable to employ the spring wheat 
discriminant developed on four segments. The absolute error of the 
spring wheat estimate in this stratum was less than 0.5% with a relative 
error of 3%. 

Moisture stress is a likely cause of many of the spring wheat label- 
ing errors. Moisture stress accelerates the rate of maturation of grains, 
the characteristic being used in the labeler. Indications of moisture 
stress were found in collateral data for those Phase 3 areas where per- 
formance was poorest, leading to hopes that an improved machine labeler 
can be developed in the future. 

In developing Procedure M we gained understanding and incorporated 
improvements in other system components besides the labeler. An unbiased 
procedure for sampling quasi-fields for labeling was developed and improve- 
ments were made in our spectral stratification procedure. Increased know- 
ledge of the BLOB algorithm led to the development of a standard parameter 
set for use in wheat inventory, as well as a method for maintaining field 
definition in spite of cloud-covered data. Also, haze correction pro- 
cedures were improved by the development and implementation of a version 
that applies a spatially varying correction. 

4 . 3 RECOMMENDATIONS 

It is recommended that Procedure M tests be expanded to include 
analyst labeling of quasi-fields, a step that has not yet been tested. 

In addition, efforts to develop a moisture stress indicator should be 
undertaken and the mechanism, when developed, should be incorporated 
into the machine labeler to allow appropriate localized adjustment of 
the spring wheat decision rule. Continued development of machine label- 
ing techniques for these and other crops is recommended. 
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It is also recommended that efforts be addressed to possible improve- 
ments in other ‘ components of Procedure M'. The bias" caused by not sampling - 
edge pixels and small fields should be analyzed and corrected. Additional 
improvements in the spectral stratification technique can be expected to 
further reduce the sampling variance. Finally, the application of Pro- 
cedure M to additional crops, such as corn and soybeans in segments 
acquired during the 1978 season, and performance evaluation are recom- 
mended . 

As was mentioned earlier, JSC's Procedure 1 is an initial implemen- 
tation of a statistical sampling viewpoint applied to the spectral domain 
of remotely sensed data. Procedure M carries this development further in 
several important respects; by the use of state-of-the-art preprocessing, 
normalization and feature extraction techniques based on a physical inter- 
pretation of Landsat MSS data; by extending the spectral stratification 
concept with multiple strata to produce improved sampling efficiency;, by 
using an unbiased technique of cluster sampling; by providing natural 
(field-like) labeling targets; and by automatic labeling applied to an 
especially difficult discrimination problem, spring wheat from other 
spring small grains. As mentioned above, further development in these 
various component areas is recommended. In addition, in the longer term, 
further synthesis of classification and sampling viewpoints, and research 
toward that goal is recommended. Procedure M is one realization of a 
flexible, modular, and efficient testbed which can be used to test ad- 
vanced procedures which will derive from such a synthesis. 
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APPENDIX A 

PROOF OF UNBIASEDNESS OF MIDZUNO SAMPLING TECHNIQUE 


The Midzuno sampling technique -is described and illustrated by 
example in Section 2.2.2. This appendix presents both an algebraic 
proof of its unbiasedness and an empirical demonstration. 

A. 1 ALGEBRAIC PROOF 
We suppose that 

S is a sample of k fields to be chosen 
B is the number of fields in the stratum 
n^ is the number of pixels in the i fc h field 
p^ is the proportion of wheat in the i* 1 * 1 field 
N is the total number of pixels in the stratum 


We will prove that the proportion of wheat in the sample 


l 

isS 

I 


ieS 


Vi 


n. 

a 


is an unbiased estimate of the proportion of wheat in the stratum 

I Vi 

i e stratum 
N 

We first show that, as in the example, the technique chooses a 
sample with probability proportional to the size of the sample (i.e. , 
the number of pixels in the sample) . 

The sample S of k fields is chosen in one of k distinct ways depending 
on which field, i, is chosen first. The probability of one of the ways is 

& 

In Procedure M for spring wheat, quasi-fields or blobs are the 
entities sampled and labeled. 
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because the first field is chosen with probability n^/N, 'proportional 

to size, and the remaining ‘ k-1 fields are chosen with equal probability 

from among the ( ^ ^ ) such subsets. Thus the -subset of k-1 fields that 

/B— 1 \ 

complete S is chosen with probability l/(. . J . When the k terms of 
the sample probability are added up, one term for each way, the result 
is 


is / ( B-1) 


N 


and thus the sample probability is proportional to the size of the sample 
(as measured in pixels). 

The sample extimate p is 

I n -P • 
ieS 1 1 

i "i 

ieS 


The expected value of p is obtained by multiplying the sample proba- 
bility by p and summing over all possible samples. In symbols 


£p = I 

all samples 
of k fields 


I \ 

ieS 


l 


ieS 


n.p. 

ii 


N 


(K) 


I n. 

isS 1 


As in the example, the number of pixels in the sample £ n. cancels out 

ieS 

of numerator and denominator. 
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We are left with 


G = 


all samples S 
of size k 


isS 


n.p. 

i i 


■fcD 


Any one field, i, will occur in exactly QLi) samples because the 
other fields in the sample can be chosen in that many ways 
will occur in the numerator that many times, once for each possible 
sample in which Field i occurs. Thus the numerator is 


A term n.p. 

11 


(k-i) i n i p i 


i=l 


and hence 


G 


B 

I n -P- 
. L n 11 


i=l 


N 


= P 


Q.E.D. 


A. 2 DEMONSTRATION 

A FORTRAN program was written to try out the Midzuno technique and 
compare it with simple random sampling. We defined a stratum with seven 
quasi- fields as follows: 

Number of Pixels Percent Wheat 


5 

10 

10 

20 

20 

30 

40 

40 

70 

60 

100 

75 

150 

90 
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It works out that there is exactly 70% wheat in the stratum. We ran 
the program choosing subsets o£ three fields, first with 400 then with 
10,000 replications. The results were as follows: 

Sample Stand. 


Scheme 

Replications 

Mean 

True 

Dev. 

- t 

Significance 

Simple 

400 

65.6 

70.0 

16.0 

-5.50 

0.0000001 

Midzuno 

400 

69.2 

70.0 

13.2 

-1.25 

0.21 

Midzuho 

10,000 

69.9 

70.0 

12.5 

-0.95 

0.34 


Using the conventional 0.05 level for determining whether a bias is 
significant, we found that the simple sampling scheme was significantly 
biased and the Midzuno scheme was not. 

The simple scheme had a significantly larger variance than the 
Midzuno scheme as judged by an F test. The variance ratio of 1.45 was 
significant at the 0.001 level. 
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APPENDIX B 

PARAMETER VALUES OF BCLUST AND BLOB 


B.l BCLUST PARAMETER VALUES 

The distance function used in BCLUST (See Section 2.2.3) is defined 
as 


where 


nchan 


3=1 


2 — 2 

w. z (x. - x..r 

3 3 Ji 


x is the data vector 

x. is the mean vector of Cluster i 

l 

nchan is the number of multispectral channels 
Wp . .., w nc h an is a set weights 


If the distance to the closest cluster is greater than x, a new cluster 
is formed with its mean at x. 


w^, ..., w nc ^ ian and T are parameters of the algorithm. Each setting 
of the parameters produces a different result. The question is, which 
setting to use? 


We observe that if w. 


. . , w_. and x are all multiplied by the 


same constant, the algorithm is unchanged. We note that if x is increased, 
the number of clusters formed decreases (or may stay the same) and vice 
versa. 


Our performance measure for setting parameters is the R factor (See 
Section 3.1.1) which measures the purity of the clusters. A clustering 
that purely separates the crops of interest has an R factor of 0, whereas 
one that produces a constant proportion in all clusters has an R factor 
of 1. Thus the smaller the R factor, the better the score of the para- 
meter setting. 
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A complication is that the R factor tends to go down as the number 
of clusters goes up. Therefore to make the parameter runs comparable, 
the number of clusters must be held fixed. 

The following experiment was run to obtain a reasonably good set 
of weights w^, w nchan * BCLUST- was run in a multisegment mode on 

nine segments in Kansas. Winter wheat was the crop of interest. There 
were six data channels — the Tasseled Cap variables Brightness and 
Greenness in each of the first three biophases. 

The starting point of the search of six-dimensional space was a 
set of weights in inverse proportion to the ranges of the values of the 
variables. For each setting of the weights, BCLUST was run repeatedly 
with converging values of t until just 90 clusters were obtained. Then 
the R factor for the 90-cluster run was recorded as the score for that 
set of weights. The search pattern was to follow the path of steepest 
descent to a setting of the weights with the smallest R factor. 

It happened that the optimal setting was formed at the starting 
point. Any change in weight in any variable or likely combination of 
variables resulted in a higher R factor. The values of the weights at 
the optimal setting are given in Table B.l. Physically speaking, these 
values are no surprise. In Biophase 1, the fields are predominantly bare 
soil, which has a greater variation in Brightness than green vegetation. 
Hence a small weight (inversely proportional to the effective range) is 
placed on Brightness, Phase 1. In Biophases 2 and 3, crop development 
results in greater variation in the Greenness direction and hence smaller 
weights than for Greenness, Phase 1. 

As for the ' parameter t, it is usually determined by the number of 
clusters wanted. BCLUST has the capability of making repeated runs with 
appropriate changes in t until the wanted number of clusters is obtained 
(with a little leeway) . 
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TABLE B.l OPTIMAL BCLUST WEIGHTS DETEKMINED FOR WINTER WHEAT 
ESTIMATION IN NINE SEGMENTS IN KANSAS 


Tasseled-Cap 

Channel 

Biophase 

Weight 

Brightness 

1 

- 0.6 

Greenness 

1 

1.3 

Brightness 

2 

0.8 

Greenness 

2 

1.0 

Brightness 

3 

’ 0.7 

Greenness 

3 

1.0 


Weights are in inverse proportion to the effective ranges 
of the variables. 
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The number of clusters, in turn, is chosen with regard to sampling 
considerations. It would be a mistake to have more clusters (strata-)' 
than the size of the sample of quasi-fields to be labeled because then 
some clusters would not be sampled at all, possibly leading to considera- 
ble bias. An equal number of clusters and sampled quasi-fields is no.t 
too satisfactory either because the large clusters cannot be awarded 
increased sample size. In general, sampling proportionate -to the size 
of the cluster produces eff icient ' sampling. 

So the number of clusters ought to be small enough to allow sampling 
approximately in proportion to size. If such sampling leaves out some 
small clusters, then the total of their pixels should be small enough 
not to introduce significant bias. Within this constraint, the number 
of clusters should be as large as possible, because the greater the 
number of clusters, the purer they are with respect to the ground truth. 


B. 2 PARAMETER VALUES OF BLOB 


A form of BLOB's distance function (See Section 2.2.4) that favors 
rectangular fields is defined as 


nchan (x . - x . . ) 

l — ^ — + max 


(& ~ V 

v n 


(p - p ± ) 2 

v 

p . 


where 


x is the spectral data vector of a pixel 

x ± is the spectral mean vector of Quasi-Field i 

£ and p are the pixel line and point numbers rotated 
to measure distance north-south and east-west 

£ . and p . are the mean rotated line and point numbers 
for Quasi-Field i 

nchan is the number of spectral channels 
v l> •••• V nchan ’ and v p are '' elBhts 
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If the distance from the pixel to the closest quasi-field is greater 
than t, a new quasi-field is formed with its spectral mean at x and its 
line and point coordinates at l and p. 

The pattern of quasi-fields depends upon what values of the para- 
meters V- . .... v , , V. , V and t are set.' A set’ of parameters suitable 

1’ ’ nchan’ l p 

for the problem of estimating winter wheat- in Kansas was determined as 
follows. 

First, the spectral weights v^, ..., v nc ^ an were obtained from the 

corresponding weights determined for BCLUST by observing that 1/v plays 

2 . ^ 

the same role in the BLOB distance measure that w^ plays in the BCLUST 
distance measure. So ... . , v nchan were set proportional to the opti- 

mal values 

1 1 

2 * * * * » 2 
w, w , 

1 nchan 

Next, the rotated line and point variances, v^ and v , were set 
in relation to each other so that the line standard deviation and 
the point standard deviation Jv~ represented the same geographical dis- 
tance. This ratio is not 1:1 because Landsat pixels are not square. 
Allowing for rotation and squaring, it turns out that v^/v^ — 1.736. 

The next question was to find a balance between spatial and spectral 
weights that would produce good quasi-fields. Larger values of v^ and v^ 
relative to the other v's emphasize the spectral homogeneity of the 
clusters; smaller values, the spatial. The criterion of goodness was 
defined as the expected variance reduction factor, Rg, 



R E P (1 - P) 
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where 

P^ is the proportion of wheat in Quasi-Field i 
n^ is the number of pixels in Quasi-Field i 

P is the overall wheat- proportion in the segment 
n is the number of pixels in the segment 

The purity of the quasi-fields with respect to the ground truth is measured 
by how small the R^, factor is. 

To find a suitable balance, we -held the spectral weights constant 
and compared the R^ factor for ‘three sets of spatial weights, small, 
medium and large. (We do not restrict ourselves by this strategy because 
raising all the v's and decreasing t by the same factor leaves the algo- 
rithm unchanged.) The comparison was made for eight segments and is 
shown in Figure B.l. 

The general result is that the R^ factor is quite stable for all 
three settings. The vertical scale has been stretched to show any trend 
in the curves. To decide on a parameter setting, we choose one that 
these gentle trends indicate is optimal. 

The worst case. Segment 1165, has a minimum in the middle but most 
of the segments and the average trend indicate a lower setting. A setting 
of v^ = 3.46 and v^ = 6.0, midway between the two lower setting, was chosen. 

A good setting for the parameter t is harder to specify. Some con- 
siderations in setting t are that when r is increased, 

1) the quasi-fields are larger. 

2) there are fewer quasi-fields. 

3) the R factor increases (because the larger the quasi- 
fields, the less pure they are likely to be). 

4) there are fewer "small quasi-fields" (those with no ' 
interior pixels) which are left out of the stratified 
sampling . 
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FIGURE B.l QUASI-FIELD R^ FACTOR FOR THREE SETS OF SPATIAL WEIGHTS 

AND EIGHT SEGMENTS IN KANSAS 
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The choice of x depends on the balance we wish to strike between 

•considerations’ (3) and (4) . On the one hand", we would like the quasi- 

field interiors to be as pure as possible so that we aren't trying to 
label a mixture of crops. On the other hand, we don't want to leave out 
of the sampling many small quasi-fields and thereby, incur a substantial 
bias. A thorough job of choosing x would require the trial of several 
values, observation of the percentage of pixels in small fields, measure- 
ment of the bias incurred by omitting the small fields (when ground truth 
is available) and calculation of the R factor for quasi-field interiors. 

For our study of Kansas segments, we chose a constant value (22.0) 

of x that made the number of large quasi-fields roughly equal to the 

number of fields. This value was associated with the "medium" setting 

of v and v . To obtain comparable results for the "small" and "large" 
x> p 

settings, we chose f for each segment that produced about the same number 
of quasi-fields as were obtained with x = 22.0 at the medium setting. 

When a setting halfway between medium and small was chosen, a corre- 
sponding x (23.2) was. defined. 

The BLOB parameters used in the Kansas and North Dakota tests are 
given in Table B.2. The BLOB statistics resulting from this choice are 
given in Table B.3. A comparison of BLOB statistics for two values of x 
was made on a North Dakota segment and is shown in Table B.4. 

In Table B.3, we observe that the largest bias incurred by leaving 
out small quasi-fields is found in the segment with the largest percentage 
(35%) of pixels in small quasi-fields. We also observe that the percent 
wheat in the small quasi-fields is highly variable — not at all a repre- 
sentative sample for estimating wheat in the segment. In Kansas, the 
small quasi— fields overestimated wheat but in North Dakota, the reverse 
is true, as shown by Segment 1663 (Table B.4) and by several other North 
Dakota segments. 
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TABLE B. 2 BLOB PARAMETERS USED IN KANSAS AND NORTH DAKOTA TESTS 


Tasseled-Cap 

Channel 

Biophase 

v . 

J 

Brightness 

1 

25.0 

Greenness 

1 

5.3 

Brightness 

2 

14.0 

Greenness 

2 

9.0 

Brightness 

3 

18.4 

Greenness 

3 

9.0 

Brightness 

4 

21.2 

Greenness 

4 

8.1 


Number of 
Channels Used 


V 

P 

T 

2 

10.38 

18.0 

7.73 

4 

5.19 

9.0 

15.47 

6 

3.46 

6.0 

23.2 

8 

2.59 

4.5 

30.9 
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TABLE B. 3 KANSAS BLOB STATISTICS 


% 

Wheat 

% Pixels 

R Factor 

Large 

Small 

Small 

Quasi- 

Quasi- 

Quasi- 

Quasi- 

Field 


Segment ! 

% Wheat 

Field's 

Fields 

Fields 

Interiors 

1020 

26.1 

24.0 

43.2 

11 

0.04 

1035 

17.7 

17.5 

18.3 

23 

0.19 

1041 

14.4 

14.3 

15.3 

13 

0.21 

1163 

9.3 

8.0 

13.7 

24 

0.28 

1165 

7.1 

6.2 

8.9 

31 

0.20 

1167 

10.1 

7.0 

15.7 

35 

0.18 

1851 

22.8 

20.4 

33.6 

18 ‘ 

0.16 

1852 

23.4 

24.6 

15.6 

14 

0.14 

1860 

26.1 

26.2 

25.1 

15 . 

0.15 

1861 

34.9 

34.4 

42.5 

7 

0.09 

1865 

28.5 

26.6 

34.5 

24 

0.09 

1886 

29.7 

29.9 

28.4 

15 

0.17 

1887 

11.4 

10.2 

17.8 

16 

0.17 

Average 

20.18 

19.17 

24.05 

19 

0.16 

Average Bias 


-1.0 

3.9 



Average Absolute Error 

1.1 

5.5 
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TABLE B.4 COMPARISON OF BLOB STATISTICS FOR SEGMENT 1663, N. D. , 


USING TWO VALUES OF 

T 

j ~ 7 

Time Period Used 

Usudl t 

Bigger t 

2 and 3 



T 

No. of big quasi-fields 
% pixels in big quasi-fields 
% wheat bias using big quasi-fields 
R factor for quasi-field interiors 
% purity for quasi-field interiors 

15.47 

463 

81.4 

2.3 

0.108 

92.6 

22.1 

381 

90.4 

1.1 

0.137 

91.9 


2, 3 and 4 



T 

23.2 

33.2 

No. of big quasi-fields 

501 

434 

% pixels in big quasi-fields 

78.7 

87.5 

% wheat bias using big quasi-fields 

2.2 

1.2 

R factor for quasi-field interiors 

0.076 

0.109 

% purity for quasi-field interiors 

93.7 

93.1 


1, 2, 3 and 4 



T 

30.9 

44.2 

No. of big quasi-fields 

502 

452 

% pixels in big quasi-fields 

74.0 

85.2 

% wheat bias using big quasi-fields 

2.4 

1.3 

R factor for quasi-field interiors 

0.045 

0.067 

% purity for quasi-field interiors 

94.1 

93.8 
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Table B.4 illustrates the dilemma of choosing a value of x. The x 
.used in o.ur tests- .is- -in- the "usual- x" -column- and values' of' x half again 
as large in the "bigger x" column- In terms of reduction of variance 
(R factor) the usual x is superior. But the bigger x, with its halving 
of the number of pixels in small quasi-fields, cuts the bias in half. 

The question is, which is worse, a slight increase in bias .or a slight 
decrease in purity? We don't have a definite answer to this question. 

Another perspective on purity is provided by the "% purity" figure. 
This is defined by 


I n.p. 

L , i r i 

I n - 

L i 


where 

P. is the proportion of the majority crop in the interiors 
1 of Quasi-Field i 

n. is the number of pixels in Quasi-Field i 

The sum is taken over all large Quasi-Fields i in the segment 

If all quasi-field interiors were pure, the % purity score would be 1Q0. 

If there are only two categories, wheat and non-wheat, the % purity score 
cannot fall below P or 1-P, whichever is larger. The difference in purity 
scores is very slight. 

The 1% average absolute bias -in the Kansas results would seem accept- 
ably small by most standards indicating that a -sound value of x was used. 
This judgment is based on the assumption that the bias would be overshadowed 
by other larger errors in the system. Whether this bias can be safely 
reduced by raising x depends on how the labeling accuracy varies with the 
purity of the quasi-field interiors," a relationship that has not been 
measured, though further investigation of this is warranted. 
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APPENDIX C 

PROCEDURE FLOW CHARTS 

This appendix describes the major program flow of Procedure M. 
Software flow charts are presented in Table C.l and program descriptions 
in Table C . 2 . 

The procedure is coded primarily in XTRAN, an extended Fortran 
compiler developed at ERIM. ERlM's QLINE data processing system pro- 
vides the software operating environment. Currently, the software is 
configured for use on an AMDAHL 470/V6 operating under the Michigan 
Terminal System (MTS) . 
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TABLE C.l SOFTWARE FLOW CHARTS (Cont’d) 


Phase 2: External Effects Correction and Quasi-Field Definition 



A. N x 4 TASCAPPED Data 

B. 7 Ground Truth Channels 

C. N SCREEN Channels 

D. 2 BLOB Channels 

_E 1 STR IP C hannel 
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TABLE C . 1 SOFTWARE =FLOW CHARTS (Cont ' d) 


Phase 3: Strata Definition 


INPUT 


PROGRAMS 


OUTPUT 



118 






Term 


TABLE C.l SOFTWARE FLOW CHARTS (Cont'd) 


Phase 4: Labeling and Estimation 


INPUT 


PROGRAMS 


OUTPUT ' 



C. N SCREEN Channels 

D. 2 BLOB Channels 

E. 1 STRIP Channel 
P. 1 SHIFT Channel 

G. 1 Classification Channel 
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TABLE C.2 DESCRIPTION OF MODULES AND SUBROUTINES 
USED IN PROCEDURE M 


CONVRT 


MERGE 


SCREEN 


PFEAT 


Converts Lockheed sub-pixel ground truth codes to pixel 
format. Input is Lockheed, one channel, six sub-pixel 
ground truth. Output is 7 channels of ground truth with 
ERIM codes. Channel number 1 is ground truth code for 
whole pixel. If all six sub-pixels have the same code, 
then the whole pixel is given that code. If not, a zero 
is assigned to that pixel. Channels 2-7 are the 6 sub- 
pixel codes. 

Merges data, i.e., pixel data with ground truth, as needed 
for machine processing. Output is all needed data in 
one file. 

Flag bad data, clouds, shadows, etc. Input is merged data 
file. Output is same file with added screen channels, one 
added channel per acquisition of data. 

Calculate spatially varying haze diagnostics which are needed 
for haze correction algorithm (XSTAR) . Input is merged data 
file with screen channels. Output is a separate file con- 
taining haze diagnostics. 


XSTAR 


TASCAP 


FLDS15 


BLOB 


STRIP 


To apply a spatially varying haze correction to Landsat data. 
Input is merged- data file with screen channels and. the output 
file from PFEAT, Output is merged data file with pixel values 
corrected for haze. 

Performs a Tasselled Cap Transformation on Landsat 2 data. 
Input is merged data file. Output is merged data file with 
transformed data values. 

To correct ground truth labels coded for 15 visited fields. 
Input is merged data files. Output is same files with some 
ground truth codes corrected. 

To group pixels into clusters that are spectrally, homogeneous, 
and spatially contiguous. Input is merged data file. Ouput 
is same file with 2 BLOB channels added. These channels 
contain the blob number that each pixel is assigned. 

Strips off all boundary pixels around each blob. Input is 
data file with BLOB channels. Output is same file with 
strip channel added. Each exterior pixel is flagged with a 
1 in the strip channel. 
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TABLE C. 2 DESCRIPTION OE MODULES AND SUBROUTINES 
USED IN PROCEDURE M (Cont'd) 


COMPRS 


BCLUST 


BL'IST 


PROCM 


TWOWAY 


To compute signatures of MSS data where polygons (blobs) are 
encoded in extra channels (e.g., blob number)’. Input is data 
file with blob and strip channels. Output consists of 3 files; 

(a) Means of all blobs, (b) Means of blobs with at least one 
interior pixel called Big Blobs, (c) Ground truth tables. 1 

To group blobs into spectrally similar strata. Input, Big 
blob file, and ground truth tables from COMPRS. Output is 
cluster means and other associated information. 

To provide information for selecting training blobs and 
calculate crop percentage estimates. , Input is file from 
BCLUST, Output cluster (STRATA) summaries. 

Carries out labelling- and proportion estimation as a 
part of a small grains estimation procedure. PROCM call 9 
subroutines, 4 of which (PBREAD, ALLOC, BLBSEL, ESTIM) are 
in a package called- AUTO. The following .is a brief 
description of each subroutines: 

Allocates training blobs. 

Selects training blobs. 

Carries out spring wheat vs other small 
grains classification. 

Computes proportion estimate. 

Creates blob labels from ground truth tables . 

Reads’ in necessary information for allocation, 
selection, and estimation routines-. 

Carries out field shift on blob means. 

Carries out pixel-by-pixel shift. 

MTS subroutine to provide date and time 
information. 

Input is merged data file with Blob and strip channels, plus 
Big Blob and ground truth files from COMPRS, and STRATA 
summary from BLIST. Output is file for statistical analysis 
and the data file with 2 added channels : (a) Shift channel, 

(b) Classification channel. 

To produce a twoway table for comparison of the occurrence of 
specified' values in to tape channels.' Input is data file 
with added channels from PROCM. Output is the table. 


ALLOC 
BLBSEL - 
CLASIF .- 

ESTIM - 
GTREAB - 
PBREAD - 

SHFBLB - 
SHFPIX - 
TIME 
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TABLE C . 2 DESCRIPTION OF MODULES AND SUBROUTINES 
USED IN PROCEDURE M (Cont'd) 


-MISCELLANEOUS SUBROUTINES 


TRUTH - To produce ground truth tables from Lockheed or’ERIM 
ground truth tables. Called by "COMPRS"; 

GTREAD - To read ground truth produced by "TRUTH". Called by 
"PROCM" and "BSTUEF". 

GTSNIF - To read header record of ground truth tables and pass 
information through common. 

BSTUFF - Stand alone routine to provide diagnostic information 
about blobs. 

MXMPY - To multiply matrices. 

MOVER - To move data from one array to another. 

UNTASS - Perform inverse Tasselled Cap Transformation. 

Called by "XSTAR" and "PFEAT". 

GAMMA - To calculate gamma for "XSTAR". 

XCOEFF - To calculate multiplicative and additive coefficients for 

"XSTAR". 

SATCOR - To return diagonal matrix and additive vector for transforming 
Landsat data to Landsat 2 LAC IE segment calibration. Called 
by "SCREEN". 

SUNCOR - Perform cosine sun angle correction on 4 x 4 multiplicative 
transformation matrix. Called by "SCREEN", 

TASSEL - Perform tasselled cap rotation 4x4 multiplicative 
transformation matrix. Called by "SCREEN". 

UNCOR - To undo cosine sun angle correction performed by "SUNCOR". 
Called by "PFEAT". 

PCILE - Computes percentile points of histograms. Used for computing 
certain scene parameters in PFEAT,- such as the mean of the 
green arm, the mean of soils, etc. 
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TABLE C.2 DESCRIPTION 'OF MODULES AND SUBROUTINES 
USED IN PROCEDURE M (Cont'd) 


MISCELLANEOUS SUBROUTINES 


URAND - System subroutine to produce random numbers. 


RANKP 


VPROD 


Computes for a vector the rank of each number in the 
vector. It is used to order a listing of clusters in 
BCLUST according to size. 

Computes the inner product of two vectors. Use in 
BCLUST. 


RANSUB 


ZERO 


Generates- a random subset of the integers 1, N. 

Used to get a random sample of blobs in BLBSEL. 

(The first blob is chosen with probability proportional 
to size. The others are chosen with equal probability 
by calling RANSUB.) 

To zero arrays, written in IBM 370 Assembler language. 
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TABLE C . 2 DESCRIPTION OF MODULES AND SUBROUTINES 
. USED IN PROCEDURE M (Cont’d) 


I/O FORMAT SERVICE ROUTINES (FSR's) 


IUNIV 

ILEC 

OUNIV 

IMSS 

OMSS 

ILFILE 


Input routine to read universal formatted tapes or files. 

Modification of IUNIV, used to read Lockheed ground truth 
data tapes'. 

Output routine, writes universal output to tapes or files. 
Input routine to read multispectral formatted data files. 
Output routine to write multispectral formatted data files. 
Input routine to read COMPRS output files. 
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APPENDIX D 

DATA BASE FOR TESTING AND EVALUATING PROCEDURE M 

Twenty-three Phase 3 and five Phase 2 LACIE blind, sites were 
initially selected for the testing of Procedure M. Acquisitions for 
each blind site were chosen to best represent the growing season of 
spring small grains. The acquisitions for each site were merged into 
28 channels of data according to the list in Table D.l. 

The ground truth for each site was merged on a pixel-by-pixel 
basis with the acquisition data, forming the next seven channels. The 
ground truth codes were converted from the subpixel ground truth codes 
produced by LEC to an alternative code and format. Each set of six sub- 
pixels was inter-compared; if all codes were the same, the appropriate . 
value was placed in Channel 29, otherwise a zero was inserted to indi- 
cate that the pixel was not pure. Channels 30 through 35 contain ground 
truth codes for each subpixel. One channel per acquisition was added 
(Channels 36-42) to flag data that were rejected by the SCREEN algorithm. 

Of the acquisitions available, those listed in Table D.2 were used 
for clustering and stratifying the data. Segment partitions that were 
evaluated in Section 2 are listed in Table D.3. 
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TABLE D.l SEGMENTS SELECTED AND. PREPARED FOR ANALY-SI-S- 


PHA.SE 3 


Channels 


Site (State) 

1-4 

5-8 

9-12 

13-16 

17-20 

21-24 

25-28 

1104 (Mont) 

128 

146 

.164 

182 

199 

zero* 

236 

1498 (SD) 

120 

zero 

157 

174 

193 

210 

zero 

1512 (Minn) 

120 

zero 

156 

174 

193 

zero 

zero 

1513 (Minn)t 

zero 

140 

157 

175 

193 

zero 

zero 

1515 (Minn) 

zero 

zero 

157 

175 

193 

zero 

zero 

1520 (Minn) 

120 

zero 

156 

174 

192 

zero 

zero 

1602 (ND) 

125 

143 

zero 

179 

197 

216 

zero 

1606 (ND) 

125 

143 

zero 

179 

197 

zero 

zero 

1625 (ND) 

125 

143 

zero 

179 

197 

zero 

233 

1640 (ND) 

121 

140 

zero 

175 

193 

211 

229 

1652 (ND) 

125 

143 

zero 

179 , 

197 . 

zero 

233 

1663 (ND) 

121 

139 

157 

175 

193 

211 

229 

1669 (SD) 

125 

143 

161 

179 

197 

215 

zero 

1681 '(SD) 

120 

139 

156 

174 

192 

230 

zero 

1681 (SD>** 

120 

139 

157 

175 

193 

210 

zero 

1699 (SD) 

zero 

140 

158 

176 

194 

zero 

230 

1800 (SD) 

120 

zero 

156 

174 

192 

210 

zero 

1803 (SD) 

123 

142 

159 

178 

195 

213 

zero 

1805 (SD) 

zero . 

zero 

15S 

176 

193 

211 

zero 

1811 (SD) 

120 

138 

157 

174 

192 

210 

zero 

1899 (ND) 

- 122 

140 

157 

175 

193 

zero 

zero 

1913 (ND) 

125 

143 

161 

179 

197 

215 

233 

1927 (ND) 

122 

140 

158 

176 

194 

zero 

230 

1927 (ND)** 

121 

140 

157 

175 

193 

- zero 

230 

1929 (Mont)’ 

129 

147 

zero 

184 

201 

220 

zero 


PHASE 2 


Channels 


Site 

(State) 

1-4 

5-8 

9-12 

13-16 

17-20 

21-24 

25-28 

1614 

(ND) 

129 

zero* 

zero 

183 

201 ' 

219 

zero 

1633 

(ND) 

128 

147 

zero 

182 

201 

zero 

237 

1637 

(ND) 

129 

147 

zero 

182 

201 

219 

237 

1642 

(ND) 

127 

145 

163 

182 

199 

zero 

236 

1662 

(ND) 

127 

145 

163 

zero 

199 

217 

236 


Zero indicates that no acquisition was available, and four channels 
of zeros were merged to keep the files uniform. 

**For two segments, consecutive-day coverage permitted the merging 
of two substantially different sets of acquisitions. 

J. 

‘iwo sites were eliminated from analysis due to inadequate ground 
truth codes designated for strip fields which dominated the scene. 
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TABLE D. 


Site 

1104 

1498 

1512 

1513 
1515 
1520 
1602 
1606 
1625 
1640 
1652 
1663 
1669 
1681 
1681 
1699 
1800 
1803 
1805 
1811 
1899 
1913 
1927 
1927 
1929 


Site 

1614 

1633 

1637 

1642 

1662 


ACQUISITIONS USED FOR SPATIAL FEATURE 
DEFINITION AND STRATIFICATION 


PHASE 3 "SITES 


Acquisition Dates 


146 

182. 

199 

236 

120 

182 

193 

210 

156 

174 

193 


157 

175 

193 


157 

175 

193 


120 

174 

192 . 


143 

179 

197 

216 

143 

179 

197 


143 

179 

197 

233 

140 

175 

193 

211 

143 

179 

197 

233 

139 

175 

193 

211 

143 

179 

197 

215 

139 

174 

192 

210 

139 

175 

193 

210 

140 

176 

194 

230 

120 

174 

192 

210 

123 

159 

178 

195 

158 

176 

193 

211 

138 

174 

192 

210 

140 

157 

175 

193 

143 

179 

197 

215 

140 

176 

194 

230 

140 

175 

193 

230 

147 

184 

201 

220 


PHASE 

2 SITES 




Acquisition 

Dates 


129 

183 

201 

219 

147 

182 

201 

237 

147 

182 

201 

219 

145 

163 

199 

236 

145 

163 

199 

217 
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TABLE' DY3' SEGMENT PARTITIONS 


Problem 


Phase 3 

Phase 2 

Developmental 

Segments 

Red River 

1669 

1614 

1498 

1637 

1498 

1681 

1633 

1515 

1652 

1512 

1699 

1637 

1640 

1662 

1515 

1800 

1642 

1663 

1913 

1520 

1803 

1662 



1640 

1805 




1663 

1811 




1681 

1913 




1927 


1927 

1513 

1606 

1899 

1104 

1498 

1512 

1515 

1520 

1602 

1640 

1652 

1663 
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APPENDIX E 

SEGMENT-BY-SEGMENT LABELER RESULTS 


PHASE 3 SEGMENTS with Gnoo LABELING ACCURACY ( > 50% ) 



SPRING 

WHEAT 

barley 

BOTH' 


OATS 


SEGMENT 

NUMBER 

% RIGHT 

number 

% RIGHT 

NUMBER % 

HIGH! 

number % 

RIGHT 

1512 

029 

72.5 

008 . 

68,5 

.877 

70.5 

• 200 

22.1 

1520 

392 

7? . 0 

35 

88.6 

027 

73.8 

170 

30.6 

1602 

660 

82.0 

06 

50.0 

706 

7 9,. 9 

50 

68.0 

16 06 

113 

72.6 

?'c 

27.3 

‘135 

65.2 

20 

20.0 

1625 

290 

76.5 

12 

91.7 

306 

77.1 

12 

50.0 

1681 

1009 

56.9 

206 

97.6 

1255 

63.6 

1007 

60.7. 

1869 

162 

72.2 

099 

av.o 

661 

79,1 

0 

o'.fi 

1927 

1112 

75.2 

380 

88.7 

1092 

78.6 

223 

21.1 



PHASE 

2 SEGMENTS WITH 

G00f> LABELING ACCURACY ( > 

50% ) 



SPRING 

WHEAT 

barley 

BOTH 

OATS 


SEGMENT 

NUMBER 

% 

RIGHT 

NUMRER 

% RIGHT 

NUMBER X RIGHT 

NUMBER % 

RIGHT 

1610 

861 


52.0 

266 

90.2 

1127 61.0 

100 

59.6 

1633 

1280 


60.7 

87 

88.5 

1371 62.5 

1289 

39.5 

1602 

1070 


58.7 

365 

51.8 

1039 56.9 

1 159 

•39.6 


PHASE 3 SFGMENTS WITH POOR LABELING ACCURACY ( < 50% ) 



SPRING WHEAT 

barley 

BOTH 


OATS 


SEGMENT 

NUMBER % 

RIGHT 

number 

% RIGHT 

NUMBER % 

RIGHT 

NUMBER % 

RIGHT 

1652 

255 

16.9 

8 

37.5 

263 

17.5 

0 

0.0' 

1669 

. 156 

1 .3 

38 

100.0 

190 

20,6 

36 

97.2 

1699 

623 

7.7 

300 

99,0 

923 

37.0 

561 

99.5 

1913 

020 

10.1 ; 

17 

100.0 

001 

13.6 

82 

96.3 

1929 

517 

0.2 

302 

1 00.6 

819 

37.0 

57 

82.5 


phase 

2 SEGMENTS WITH 

POOR LABELING ACCURACY 

r C < 

50% } 



SPRING WHEAT ‘ 

barley 

BOTH 


OATS 


SEGMENT 

NUMBER % 

RIGHT 

number 

% RIGHT 

NUMBER % 

RIGHT 

NUMBER % 

RIGHT 

1637 

1895 

20.0 

0 6 

91.3 

1901 

26,0 

1900 

75. 8 

1662 

2902 

38.3 

290 

71.0 

3192 

01,3 

298 

86,6 


129 
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SEGMENTS USED TO DEVFLOP MACHINE LABELING CRITERION 


SPRING WHEAT BARLEY BOTH OATS 


SEGMENT 

number 

X RIGHT 

NUMBER 

X RIGHT 

NUMBER 

X RIGHT 

NUMBER 

X RIGHT 

1498 

619 

66,1 

200 

76,0 

823 

' 68.5 

902 

66.7 

1515 

1278 

62. 9 

933 

80.2 

2?1 1 

83,5 

09 

18,u 

1600 

2022 

80.9 

863 

71.0 

2685 

80.8 

0 

0.0 

1663 

1263 

90,3 

509 

71.0 

1812 

87,0 

207 

39.6 


SEGMENT 

SEGMENTS 

SPRING WHEAT 
NUMBFR X RIGHT 

NOT USED IN LABELER 
BARLEY 

NUMBER X RIGHT 

performance 

both 
NUMBFR x 

ANALYSIS 

RIGHT 

OATS 

NUMBER x RIGHT 

1100 

fl 

0.0 

155 

JOO.O 

155 

100.0 

161 

100,0 

1513 

1 

100.0 

15 

100. 0 

16 

100.0 

0 

0. 0 

1800 

18 

27.8 

376 

97.1 

390 

93.9 

1360 

95.6 

1803 

0 

0.0 

o • 

0.0 

0 

0.0 

0 

0.0 

1805 

12 

8.3 

9 

100.0 

21 

97.6 

589 

85.1 

1811 

1 

100.0 

0 - 

0.0 

1 

100.0 

108 

18.5 
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